However, stock markets are highly unstable. Their price movement often depends on decisions taken by the company, favor and reaction of investors, social impact, human emotions, and price movements of some other related stocks. These types of data cannot be made available to a program. The prediction can be close only if the market remains relatively stable.
You don't actually need to learn machine learning using python to understand how this program works, if not minimal. I'll describe each machine learning process. And don't forget to download the source code of this program, link provided at the end. Let's jump in right away!
Data Source
Required Modules and Libraries
The Code
Firstly, we set about importing all our required modules:
Our main body, which will consist of user input, data collection, data processing and result delivery is as follows:
So we have initiated our MySQL connection to get hold of our database, and a report_status input from the customer (If report status is enabled, we shall provide the user with some additional information along the process). The code after that comes in a while loop, so that we can keep predicting for as many stocks as we want.
First, we ask the user for the stock ticker, which is under a set of try-except statement for dealing with invalid stock tickers. Subsequently, we run a query to get all the data of the stock ticker entered by the user from our database, and record them in various lists, using a for loop.
Then we come to data processing. The concept of this model is we use today's stock data to predict the price tomorrow. We use all the rows except the last row of our stock table as training data and the last row as test data, which is the required data to predict tomorrow's price. The data is divided into X and y, where X contains 'features', which influence the value of the variable we want to predict, and y contains the column of the variable to be predicted, the 'result' arising from the set of 'features'.
The X and y part of the data is further divided into train (the rows that will be used to train the model) and valid (the rows that contain the result already, and therefore we compare our prediction with it to get the mean error in our prediction (No prediction is completely perfect, every prediction has at least a minimal error). Then we try different parameters of our machine learning model to lower the error. Such a method of trying to lower error in testing data may not necessarily lower the error in actual prediction, and in some cases might even lead to increase in error in actual prediction. This problem is called Overfitting, where our model depends too much on the testing data and gets less error, but fails to perform similarly for actual prediction. You may decrease the size of training data and therefore increase the size of testing data to reduce this effect, but it shouldn't cause much of a problem.
categorical_cols refers to columns in the data with non-numerical values like strings (Machine learning typically works with numerical data). numerical_cols refers to numerical columns. The following few lines of code uses Pipelines to pre-process the data, along with the categorical data, to make a model out of it.
Next, we initiate the learning rate ( a small constant multiplied to individual predictions to give better overall prediction) from 0.02. We run a for loop to send the pre-processed data to a defined function model_testing() to get the mean absolute error from all learning rates from 0.02 to 0.06 (a sweet spot, that hopefully is correct). We then use the learning rate corresponding to the lowest error to make our final model and perform the prediction using another function final_model(). The last lines are a couple of if-else statements to provide the result to the user appropriately. Moving on to our own defined function model_testing():
So we create a XGBRegressor model with the learning rate obtained from the for loop discussed above. It makes the prediction, compares it with the answer that we have from reality, and returns the mean absolute error, a.k.a score (The lower the score, the better the model). Moving on to the function final_model():
This is similar to the previous function model_testing(), except that it takes in the learning rate with the lowest score and returns the final prediction. Finally, we also have another function time_taken() which records the time taken to find out the best learning rate, and displays it to the user if report_status is True.
And so our program is complete and ready to run! Make sure that your program reflects upon any changes you might have made like password of your MySQL, name of database, file directories etc. Also make a change in the model parameter n_jobs and set it equal to the number of cores in your processor (for quick working).
Quick Important Note
If you read the blog post :Stock Market Recorder: Stock Market Data collection using Python, you will know that the stock collection program stores the stock data the next day after the market activity. So suppose you want to predict a stock price today, you will ideally run the program in the morning before the market opens today, and it will predict the price for today.
The Working
A sample input of a stock ticker, along with report_status set to True shows the output as follows:
Note: Your PC might run its cooling fans at higher speed as the program runs the model multiple times. This is normal and expected
And there you have it! A Stock Price Predictor that works at your ease. Here's the Source code of the entire program: Download Python Program. Stay tuned with this blog for more.
Very Informative and creative contents. This concept is a good way to enhance knowledge. Thanks for sharing. Continue to share your knowledge through articles like these.
ReplyDeleteData Engineering Services
Artificial Intelligence Services
Data Analytics Services
Data Modernization Services
Very Informative and creative contents. This concept is a good way to enhance knowledge. Thanks for sharing. Continue to share your knowledge through articles like these.
ReplyDeleteData Engineering Services
Artificial Intelligence Services
Data Analytics Services
Data Modernization Services