Hello everyone, welcome back to our blog! Last time, we share how to start the very first step of NLP trading strategy development - Collecting the data. In this blog, we would like to introduce how to test trading strategies with Python tool. We will take our data collected from the last post as an illustrative example.
Catch the mood, and… short it
In quantitative trading, investment managers design strategies to buy (take a long position) or borrow and sell (take a short position) automatedly when there is the designated trading signal, i.e. a parameter meets a certain threshold. The position will be closed every day. The best strategy, of course, will be the one who generates the most profit. The measurement for its profitability is to calculate the daily profit and loss, aka pnl, generated by the strategy. There are two steps of testing, first called backtest and the second called walk forward test. Backtest is to find the optimal parameters as the trading signal for the trading strategy, and the walk-forward-test will test its performance with more recent data to validate its out-of-sample profitability. If a strategy can achieve a good Sharpe ratio in both tests (larger than 1 favorably), the strategy is deemed as good.
Illustration for how to calculate daily pnl
P stands for the closing price every day. TC stands for the unit transaction cost.
In our project, we try to speculate the bitcoin futures prices changed based on the analysis of bitcoin-related tweets. The tweets about the bitcoin imply the market expectation of bitcoin. If there is a promising and cheerful mood, more people will buy bitcoin, thus driving up the future contracts price. On the contrary, if there is a negative, pessimistic mood, people will sell bitcoin and the futures price will fall. We wanted to design a trading strategy with the signal of tweets sentiment score. In another word, there is only one parameter in our model: the sentiment score. We will do a backtest to find the optimal sentiment score threshold to decide our position.
Now let’s look at an example
After we downloaded bitcoin tweets from the last post, we use the VADER sentiment analysis tool to generate a sentiment score for each tweet. As the main topic of this post is trading strategy testing, we will not talk in detail about sentimental analysis. Then we merged the sentimental scores into a daily time series data frame and merge it with bitcoin futures contract daily close price.
Date | Close price | Score (sum) | Score (avg) |
---|---|---|---|
12/15/2017 | 19700 | #N/A | #N/A |
12/18/2017 | 19270 | 0.445 | 0.0371 |
12/19/2017 | 18415 | 0.8111 | 0.0386 |
12/20/2017 | 17300 | 2.803 | 0.1078 |
12/21/2017 | 15595 | #N/A | #N/A |
12/22/2017 | 14415 | 0.4429 | 0.0148 |
12/26/2017 | 16035 | 2.1684 | 0.1668 |
12/27/2017 | 14940 | 1.4404 | 0.1309 |
12/28/2017 | 13970 | 2.6372 | 0.1758 |
12/29/2017 | 14535 | 3.8037 | 0.1153 |
Extract from the merged data frame.
Step by step process to test with the time series data:
- Forward fill the sentiment score data if it is missing
- Calculate the daily return of bitcoin futures as the position holding period is one day if there is a trading signal
- Apply the logic into the historical data: if sentiment score is above the long threshold (e.g. 0.6), we will take a long position; if the sentiment score is below the short threshold (e.g. -0.6), we will take a short position; if sentiment score is between the long and short threshold, we will take a zero position
- Obtain the adjusted pnl time series for different parameters, after incorporating transaction costs (commissions, bid-ask spread and slippage). In our model, we assumed that the bid-ask spread and slippage are negligible.
- In this step, we set up a for loop with bound and step, calculate the pnl for every parameter in the range
- Separated the adjusted pnl into a training set and test set, and we calculate Sharpe ratios for these two sets of data
- List the top 5 parameters which generated the largest Sharpe ratios in training data set, with their Sharpe ratios in both training and test set
Let’s take a look at the code now.
Import packages needed and basic setups.
# Import packages needed
import pandas as pd
import numpy as np
import datetime as dt
import statsmodels.api as sm
from math import sqrt
from scipy import stats
import os
#input the variableto backtest (sum: bound = 12, step = 0.1; avg: bound = 0.6, step = 0.01)
## add plot for historical data step 5% of total range
score = 'score (avg)'
bound = 0.6
step = 0.01
#dataframe and other variables setup
df = pd.read_excel('YOUR_CSV', index_col = 0, parse_dates = True)
df = df.fillna(method = 'ffill').dropna() # forward fill N/A fields
df['p_%chg'] = df['close'].pct_change() # percentage change of daily close price
res = [] # create an empty list for result
#input transaction costs
tc = 10/(8500*5) # transaction cost per contract of CME bitcoin futures
Parameter optimization: find the threshold with the highest Sharpe ratios.
#optimizer
for i in np.arange(0, bound, step):
try:
df['posi'] = np.where(df[score]>i, 1, np.where(df[score]<-i, -1, 0))
df['pnl'] = df['posi'].shift(1) * df['p_%chg'] - abs(df['posi'].diff()) * tc
sr_is = df['pnl'].iloc[:123].mean() / df['pnl'].iloc[:123].std() * sqrt(252) # calculate in-sample Sharpe ratio
sr_os = df['pnl'].iloc[123:].mean() / df['pnl'].iloc[123:].std() * sqrt(252) # calculate out-sample Sharpe ratio
## The easiest way to divide training and test set is seperate the data with a 8:2 in-sample to out-sample rate, however there are also other more advanced way to divide data set
res.append([sr_is, sr_os, i])
except:
res.append(['math error', 'math error', i]) # Define math error which will occur when the standard deriviation is 0
# Wrirte the result in dataframe and print out the data with best Sharpe ratios
opt = pd.DataFrame(res)
opt.columns = ['sr_is', 'sr_os', 'thres']
opt = opt[opt['sr_is'] != 'math error'].sort_values(by = 'sr_is', ascending = False)
print(opt.head())
Example of output
In this example, the backtesting part works okay however none of the parameter passes the walk-forward-test (the out-sample Sharpe ratio is too low or even negative).
The robustness of the strat can also be checked by plotting the pnl curves.
#optimized result plotting
thres = opt['thres'].iloc[0]
df['posi'] = np.where(df[score]>thres, 1, np.where(df[score]<-thres, -1, 0))
df['pnl'] = df['posi'].shift(1) * df['p_%chg'] - abs(df['posi'].diff()) * tc
df['cpnl'] = df['pnl'].cumsum() ## arithmetic sum, assumption: no reinvestment
df['cpnl'].plot()
Example of pnl plot
Wish you a nice day!
Thank you for reading this post! Hope this post helps you to understand how to carry out backtest and optimize your trading strategy using python! Hope you have fun in your own project and may we meet again in the future!