Skip to main content

Stock Market Recorder: Stock Market Data collection using Python

 


Stock Market Data Collecting Program

You must have heard of stocks here and there, sometimes its on the television, some day a stock crashes, or a stock surges, it has become a star of the show when it comes to finance in recent times. Huge companies involving small investors like you and me who are willing to invest our money in their company for a tiny, tiny, tiny share in return for hopeful profits is very common nowadays.

If you are one of those people who have a keen interest in stocks, then you must have certainly used one of the apps on your phone which tells you the price of the stock on that day, the company's market capital, trading value, trading quantity and what not. Apart from the financial point of view, its fun when you invest a fraction of your pocket money on these and see them up by 0.01 the next day. It makes you feel like you are destined to be the next Warren Buffet.

Some people actually make a living out of it. They watch the stocks wherever it goes. But the majority of us don't get so much time. Even navigating to the app on your phone sometimes feels lazy. And on day 3 of your stock market career, you have just lost your interest in it as well.

Therefore, I present to you a small Python Program that is ready to do that for you. We will create a simple yet very useful program which collects stock data from the web and stores it for you to analyze. From there on you can create multiple other programs to effectively analyze your own stocks.

Required Python Modules

You will need to install python module called urllib using pip, which essentially allows us to perform requests from the server and get desired information. You will also need to install pandas for getting information from '.csv' files and mysql.connector to create a link between Python and MySQL to store extracted information. We also import mysql.connector.errors which comes with mysql.connector. We will also use datetime and time, but these are already pre-installed with Python.

Overview of Our Program

Before we jump onto the code, let me shortly tell you how our program is going to function. On startup, our program will access the internet (make sure you are connected to the internet, else there will be an error shown) and search for a .csv file containing information of the day's trades. These can be easily found in one's own national stock exchange, listed under Market Activity or Resources. The program will download this file in a predetermined location. Then the program will extract the information using pandas pd.read_csv function and transfer the information, stock-wise, to our MySQL Database.

Points To Note

I am an Indian, and therefore I will display the sources as in my country, and it hopefully should be easy to adopt it for yours as well. And Generally, many stock exchanges upload the market trades the next day, so they are in no way beneficial for real-time data. Rather, they will be more helpful for data visualization. I shall explain the code snippets one by one, those interested in just taking the source code, can skip ahead to the end. No Strings Attached.

First, The Data Source

First, you have to navigate to the website of your preferred stock exchange and find the suitable '.csv' file which contains our required information. Just so that I am clear, the information is expected to contain the symbol/name of the stock/company, the open/close prices, trading volume, quantity, and other extra details that may be available for you. Here's where I got mine:

Official Website of a Stock Exchange


We have NSE (National Stock Exchange) here in India, and I have found the .csv file Market Activity Report under Market Data.

Now, Right-Click on the download link and click on Copy link address. This will copy the address. Create a notepad and paste it there for future use.

Now comes a somewhat tricky part. I am not very sure, but from what I understand, these stock exchanges must be uploading their data automatically, instead of manually. Therefore, the link will change everyday, but it will be predictable.

For example, the file I got had the link: https://archives.nseindia.com/archives/equities/mkt/MA060121.csv
Can you guess how the link will change everyday? If you can see clearly, after the letters MA after the final / (backslash), are the numbers 060121. Can you guess what these numbers might be? Yes, this is a date! 06 corresponds to 6th, 01 corresponds to the month, January, and 21 corresponds to the year 2021 (Americans, Beware of the Date Format!). So now we know that the only change in the link are going to be these numbers, which are simply the date. Voila!

Also go ahead and download this file somewhere yourself for the next part.

The '.csv' Data File

Go ahead and open your file. Let me show mine:
Downloaded File from Stock Exchange Website


 As you open the file, you may see some unwanted information like Top 20 Gainers, Top 20 Losers and all, so slide down to the required stuff which has the normal market data:


Downloaded file from Stock Exchange Website

As you can see, I have found my useful stuff under Securities Price Volume Data in Normal Market. So here comes another somewhat of a tricky part. Try to find a sort of landmark that is unique near the useful data, either the heading Securities Price Volume Data in Normal Market or the Column headers SYMBOL,SERIES,CLOSE PRICE...etc. This will enable the program to omit the unwanted data as per the given landmark. In case there is more unwanted data beneath this data, you may need another 'landmark' over there to omit that too.

So again, go ahead and copy paste this 'unique landmark' for future use.

Just a Little Somethin' for MySQL

The file that I have shown you has an example contains nearly 1,900 stocks. So we need 1900 tables in our MySQL database. Each table will have the name/symbol of the stock that it is storing. So go ahead and create a database for this. I have created my database daily_stock_recorder. The good part is that you don't need to create these tables manually, the program will do that too, as i will show in the code.

The Code

Finally we have arrived at the code. Lets begin right away. Firstly, importing the modules mentioned earlier:


Under Points to Note, I had told you that the day's trades are uploaded the next day. So essentially, we are working with the previous day's data every time we start the program. This means, we need the previous day's date to fill in the link, so that Python can find the .csv file. Here's getting the previous day's date and making the file link:


So we take yesterday's date, separate it into day, month and year, and then stick it back into the link. The if-elif statements are to determine the different forms of the date, since the date format we receive from Python and from general are not always same. We have our link ready in the variable mkt.

Now we create a function download_all_stocks(csv_url) which will take the variable 'mkt' as an argument and download the '.csv' file from the internet:


So there is a response = request.urlopen() which opens the provided url. We take csv = response.read() to take the content and then later convert it to type str. This csv file will be raw and be on a single line, so we tell the program to go to the next line when it encounters 'n using lines = csv.str.split("\\n"). Note: There is double-slash \\ to remove special meaning of  \.

Then we are running a for loop which is going line by line in the variable lines and checking for the landmark that we want. I had taken the title as my landmark, with my required data right below it. We have also introduced a variable stat which is by default false. As long as it does not encounter the landmark, the stat remains false and the line is omitted (useless data)(By omitted, I mean that the program simply skips the line without writing it). As soon as it encounters the landmark, stat becomes true and the lines after it are considered.

For those who have useless data beneath their useful data, they need to follow similar process, except this time, stat becomes false if it encounters the second landmark. There is also a variable n which is storing the number of lines that are accepted, which would infer as number of stocks, but we have another variable for that later on, so we are not asking its value, though its getting updated with each line.

Going ahead, we need to check if yesterday was a weekend. Stock markets are closed on weekends, so if you didn't put the if-elif command for that, your program will keep running, because it is trying to find a file that doesnt exist. Then we are initiating the download_all_stocks() function:


So we download the stocks and use result_df.drop() to drop stocks with duplicate names. We also use the .reset_index(drop=True) function, because after removing the duplicates, the indexes will be irregular, and by doing this we are resetting the index to make it uniform again. Then we initiate the MySQL connection to our database that we have created. Then we initiate three variables stocks_entered, duplicates, and new_stocks which will give us the total number of stocks recorded, the duplicates, and the new stocks entered that day. These are somewhat governed by the error statements under except. I'll come to that in a bit.

The for-loop goes over each stock present in the pandas dataframe stored in the variable 'stock_df', gets the required columns from variable 'data' that we got from 'cursor.fetchall()' and inputs them in their corresponding table in the Database.

Now here's a very important part:

 Can you see the first line under try statement:                     cursor.execute('insert into `%s` .......'%(...)) 

We are asking MySQL to insert the values in a table %s (values of which come in a tuple after it), which is written as `%s`. Note that these are not quotes that surround %s. This is a special character that lies below the 'esc' key on your keyboard. This is because MySQL rejects table names with special characters. But what do we do if a stock symbol contains a special character? That's where this '  `  ' key comes to play. Using this instead of your normal quotes, MySQL won't reject the table names. Use this while inserting the values as well as if you are using some other program to extract data from this database. This special Character is called a backtick, and its exact location on a standard keyboard is shown below:


Backtick on a Standard Keyboard


Now coming to the error statements. MySQL gives an error with error number 1146 if there is a new stock, whose table doesnt exist in the database. This will occur if you are starting the program for the first time, when all the stocks mean a new table for your database, or when the Stock Exchange introduces a new stock.

MySQL also gives error with error number 1062 if the data contains duplicate value. For example if you ran the program twice on a day, the data remains the same, so inputting the data again the second time is not possible as MySQL tables can't take the same values again. In that case, we simply put a pass statement to do nothing.

So we put the above mentioned errors under the except statement using the mysql.connector.error module that we imported in the beginning. In the end you have the print statements that will tell you how many stocks have been inputted, how many new stocks, and how many were duplicates.

So there you have it. You are now in possession of a Data Source that can do a huge task in a matter of seconds, recording thousands of stocks for you to analyze. But don't yet go now! I have something in store for you:

Till now, there is a good chance, you must have been clicking on your python programs to run them. How Notorious! What if the program runs when you simply turn on your system. Yep, you heard me right, all you have to do is turn on your pc, and your program will update its database each day. Here's how:

Create a shortcut of this python project. Cut it, and then go to File Explorer. Once there, click on the directory/path that shows on the top. Delete whatever is already there and paste this: C:\Documents and Settings\All Users\Start Menu\Programs\Startup

Click Enter and you will be taken to the directory mentioned above. Paste your python program shortcut here and close everything. That's it. Now every time you turn on your pc, the program will open on its own, run the process, show you some details and then close itself. Voila again!

The Working

So when you run the program, don't be impatient. God knows how many stocks there are in the stock exchange. It takes a good deal of work for your pc to transfer the information. Don't press anything or close the window as long as your cursor is blinking. Here's the Result:

 

Program shows details of stocks entered


This is no end. After a few days you will have a database with lots of information. You can plot graphs, maybe even use Machine learning algorithms or just about anything. You know the saying 'The sky is the limit!'

And for getting the information, just go to the database and input the select* from <stock_symbol> command.

Additional Important Note:

Indian viewers can simply refer above codes for themselves. But for others there are some important changes that you may have to do yourself:

  1. As mentioned earlier, you have to find your own required .csv file from any website and also have to figure out the pattern with which the link changes everyday.
  2. If your link does change as per the date, then make sure about the time format, and change the if-else statements accordingly.
  3. Figure out the unique landmarks on your file so that the program can extract useful information. Again, don't get landmarks which may change, rather something like headings which will probably remain same.
  4. pd.read_csv() is a pandas function which allows for information extraction from csv files. It takes the complete directory of the file along with its name as argument. Note that this directory is the same as what you must have given as dest_url for downloading the file.
  5. Note that in front of the address of a directory, r is written before the quotes containing the address. This declares the address to be 'raw string', so that special meaning of '/' doesn't apply.
  6. Note that in mycon = mc.connect(host = 'localhost',.....), all details except database will remain same, provided you have not applied any specific settings for your MySQL. Change your database name accordingly with the name of the database where you plan to store the data. Make sure you have already created a database for it.
  7. My file had 4 columns: SYMBOL, CLOSE PRICE, TRADE_VALUE and TRADE_QUANTITY, therefore I am calling these four columns for each stock in the .csv file using a for loop. In case you have different columns, or different number of columns, you will have to change the code accordingly.
  8. Note that you can run the program however many times a day. But the program simply won't add the data after the first time.
  9. Enjoy! 

Here's the download link for the entire code: Download Python File

Perform the tweaks mentioned in Additional Important Note and you can run the program with no issues.

Enjoy!


Comments

Popular posts from this blog

Predicting Stock Prices using Machine Learning (XGBoost)

  Today, I'll show you how to create a machine learning program to predict stock prices. Machine learning is used in a variety of fields, and we can utilize its ability to learn and predict from data to predict useful variables, with minimal error. Therefore in theory, we can apply the same on stocks to predict the next closing price, so that we can make a killing gain. However, stock markets are highly unstable. Their price movement often depends on decisions taken by the company, favor and reaction of investors, social impact, human emotions, and price movements of some other related stocks. These types of data cannot be made available to a program. The prediction can be close only if the market remains relatively stable. You don't actually need to learn machine learning using python to understand how this program works, if not minimal. I'll describe each machine learning process. And don't forget to download the source code of this program, link provided at the end. ...

Simple Omegle Bot Using Selenium With Python

Omegle is an online text-based and video-based chatting platform, which allows users from around the world to talk to complete strangers anonymously, for free! The text-based Omegle has a simple concept: Complete a Captcha  verification, connect automatically to a stranger, and after chatting get on to the next stranger. Apart from having a chit-chat with a stranger from the far side of the globe, Omegle poses as the perfect platform for other uses as well. At Omegle, you  can advertise your content, website, products and more for free. With access to about 40,000 strangers using Omegle at any given time, you can benefit if your ideas/advertisement is seen by potential customers. But of course, we can't advertise to each guy we meet again and again. I mean, come on, that's a lot of hard work, even if your message consists of a few words. But worry not, since that's where our Omegle Bot  comes in play. This bot works on a pre-determined set of messages that are to be conve...

English Dictionary With Python And Tkinter!

English Dictionary on Python Have you ever tried to read from the small Oxford Dictionaries? Those small yet bulky books have thousands of words cramped on a page the length of your finger! And on top of that, not to forget the hassle of flipping across the pages, searching for your word, God knows where is hiding on which page! Nowadays though, hardly does anyone ever use the classic dictionaries. With the access of technology on our fingertips, one tap on Google, and you would be on with your way.  But how about making a program of your own which can do the same for you? Sounds interesting? Such a program can enable you to run it and keep it opened, while you are reading a book, so that you can search the definition of a new word you encountered. What if you are writing a book, perhaps a report? You know what you are supposed to write, but you choose to get a word for it, so that you sound professional. So you just search for a short definition and the program find the word with ...