Ensemble Learning – Stashable

Candlesticks are an important tool used by financial analysts for predicting stock movement. Literature on stock market analysis often provide various combinations of candlestick shapes and sizes, taken singly or in groups of three or four as heuristics that can give insight into behavior of stock price limits. This project is an exercise in trying to understand whether the measurable properties of candlesticks when used as a set of input features to one or more machine learning algorithms can provide reasonable predictions for stock price movement when applied.

Parts Of a Candlestick

A candlestick is used to describe the price movements over a day. Usually candlesticks are plotted over a period of days or months to create candlestick charts. Any given day’s candlestick represents:

Open Price
Close Price
High Price
Low Price

The candlestick is also assigned a color depending on whether the Open Price is greater than the Close Price or not

The Problem Statement

Here is a detailed description of the problem

To keep the problem simple, I would like to predict the direction in which the stock would move compared to the previous day-up or down, and not take into consideration the absolute value of the stock in question
As input feature space, I have used the lag values of the parts of the candlestick, trading volume and the percentage change in stock prices upto a period of five lags
I have applied a number of algorithms including SVM,Logistic Regression,Neural Net, Decision Tree and random forest. Additionally I decided to use ensemble learning using VotingClassifier.

Choosing the preliminary characterictics

I’m using the same dataset that I used in my last post-The TCS dataset. I decided to work with the High Price,Close Price,Open Price, Low Price since these are the features that make up a candlestick anyway. As an additional feature, I’ve also taken the volume of shares.

Feature Engineering

Additionally I created a few more feature variables namely

delta-The difference between the Open Price and Close Price
sizee-The absolute value of delta
top-The higher value among Open Price and Close Price
bottom-The lower value among Open Price and Close Price
colour– Takes a value of either 1,-1 or 0. 1 If Open Price is higher than close Price. -1 If Close Price is higher than Open Price. 0 If Open Price and Close Price are equal
UpperShadow-The difference between High Price and top
LowerShadow-The difference between bottom and Low Price
PriceChange-The difference in the Closing Price for the current day as compared to the previous day
PriceChangePercentage-The percentage of PriceChange

The last column is direction which I’m trying to predict. This is just the difference of the current day’s Close Price and the previous day’s Close Price. This takes the value of:

1 – if the current day’s Close Price is higher than the previous day’s
-1 – if the current day’s Close Price is lower than the previous day’s
0 – if there has been no change

Creating the lag variables

I first created a function called lagger which would just lag all the columns whose names I passed to it.

I passed in all the feature variables to the function

The columns of the updated dataframe are now:

Preparing the dataframe

I then removed all the column apart from the lagged columns and the direction column from the dataframe

The new columns are now:

I then removed the first few records, since some of the values were NaN due to the lag

Splitting the data

I set the direction as my target variable and the rest of columns as my training variables and split the data:

Decision Tree:

This gave me an accuracy of :

Doing the same for other learners:

Random Forest:

Logistic Regression:

Neural Network:

Neural Network with GridSearch CV

Squeezing the feature space

Lastly, I tried to reduce the number of inputs I was feeding to the learners using PCA. I decided to find 5 principal components and use them as input. I then concatenated the direction column to this new dataframe

Voting Classifier

I used a voting classifier on the features returned by PCA to predict the direction. Unfortunately, this did not really yield better results

Conclusion

As the results above show, the predictions from all the excercises are of poor quality-slightly better than random. This raises some concern for the usability of candlestick parts as predictions for stock price prediction.

Get the code:

You can find the code [jupyter notebook ] on my github here

I hope that this exercise was somewhat useful to you. If you have any thoughts, please leave them in the comments below and Ill get back to you soon!

Tag: Ensemble Learning

Investigating the use of candlestick parts as a feature space for predicting stock direction

Parts Of a Candlestick

The Problem Statement

Choosing the preliminary characterictics

Feature Engineering

Creating the lag variables

Preparing the dataframe

Splitting the data

Decision Tree:

Random Forest:

Logistic Regression:

Neural Network:

Neural Network with GridSearch CV

Squeezing the feature space

Voting Classifier

Conclusion

Get the code:

Parts Of a Candlestick

The Problem Statement

Choosing the preliminary characterictics

Feature Engineering

Creating the lag variables

Preparing the dataframe

Splitting the data

Decision Tree:

Random Forest:

Logistic Regression:

Neural Network:

Neural Network with GridSearch CV

Squeezing the feature space

Voting Classifier

Conclusion

Get the code:

Share this: