Automatically Generating A Financial News Corpus Using News Corpus Builder

I had to quickly develop a model that will be able to categorize articles across a wide range of financial  topics. Developing the corpus manually was not an option and writing custom crawlers  for specific  news sites would be a tedious process.

Having had to create a different corpus previously and lacking  such a financial corpus, I created News Corpus Builder  to allow myself and others to be able to generate  various corpora about any particular topic/s.

Google News is used as the source to obtain articles.  Despite the limitation of  only 100 articles per search term by Google, you are able to build a large corpus by using multiple or similar words to retrieve  articles per topic. Alternatively you could just run it daily to increase the size of your corpus.


Screen Shot 2015-09-05 at 5.42.54 PM

Continue reading…

Introducing Ark Agent: Current & Historical Stock Market EOD Data Application

Ark Agent is an application used To Collect Current and Historical End Of Day Stock Data leverages Celery and MongoDB to provide end of day and historical market data for stocks.  It also uses finsymbols that I wrote a while back Finsymbols

Please see the nicely formatted documentation on GitHub for more details  Ark Agent Wiki  

Obtain Finance Symbols for S&P 500,NASDAQ,AMEX,NYSE via Python

I recently completed  Computational Investing Part 1 which further tickled my curiosity. Before completing the course I  had started to play around with a few ideas.  To do any kind of analytics with finance relating to the stock market  you require symbols.

Created a simple module that uses BeatifulSoup  to parse the list of S & P 500 symbols from Wikipedia and nicely formats the data to be use programmatically an application.

You can grab the code here

Gist of it. Will return to you a list of all the symbols and related information


Stay tuned to see how we then use the list of symbols to obtain prices for each symbol