Automatically Generating A Financial News Corpus Using News Corpus Builder

I had to quickly develop a model that will be able to categorize articles across a wide range of financial  topics. Developing the corpus manually was not an option and writing custom crawlers  for specific  news sites would be a tedious process.

Having had to create a different corpus previously and lacking  such a financial corpus, I created News Corpus Builder  to allow myself and others to be able to generate  various corpora about any particular topic/s.

Google News is used as the source to obtain articles.  Despite the limitation of  only 100 articles per search term by Google, you are able to build a large corpus by using multiple or similar words to retrieve  articles per topic. Alternatively you could just run it daily to increase the size of your corpus.


Screen Shot 2015-09-05 at 5.42.54 PM

Continue reading…

MySQL High Availability Architectures

In order to understand replication in MySQL and how high availability  is achieved  it is important to have an idea of how it is done. Replication in MySQL is primarily achieved using the binary log.  The binary log records all changes made to the  database along with additional information related to those changes such as time taken to for a statement to update data Et cetera.    From a high level the master write database changes/events to its(master) binary log file and then notifies the salve(s) of  the new updates which then reads and applies those changes .The binary log is not only used for replication but can be used for auditing and point in time recovery.For more details about MySQL binary log please visit:

MySQL Binary Log Documentation
How Does MySQL Replication Really Work

Depending on your requirements there are various  architectures or ways that you can configure MySQL and MySQL Cluster. Below is just a summary of some of the most frequently used architectures to achieve high availability.

MySQL Master/Slave(s) Replication

MySQL master to slave(s) configuration is the most popular setup. In this design One(1) server acts as the master database  and all other server(s) act as slaves.Writes can only occur on the master node by the application


*Analytic applications can read from the slave(s) without impacting the master

Continue reading…

MySQL Cluster 101

What is MySQL Cluster

MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability.[2] MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL (“NDB” stands for Network Database).Wikipedia

For simplicity MySQL Cluster can be defined as   a shared-nothing distributed “database” that is designed for fault tolerance, high availability and high performance.

MySQL Cluster Architecture

MySQL Cluster is made of  three main kind of nodes:

1. Management Node(s) [ndb_mgmd]  –  Are responsible for performing mainly administrative  functions related to the cluster. Management nodes are used to check the status,start & stop nodes that are apart of the cluster. They are also responsible for distributing information about the makeup of the cluster.

Continue reading…

MySQL Cluster Getting Started [Redhat/Centos 6]


1.Download the RPM from


2.Install the RPM  and dependencies

yum groupinstall 'Development Tools'

yum remove mysql-libs

yum install libaio-devel

rpm -Uhv MySQL-Cluster-server-gpl-7.3.5-1.el6.x86_64.rpm

This installs all the binaries that will be required to configure each component of the MySQL Cluster.

Screen Shot 2014-06-18 at 9.55.52 PM


Continue reading…

RabbitMQ Exchange to Exchange Bindings [AMPQ]


The exchange-exchange binding  allows for messages to be sent/routed from one exchange to another exchange. Exchange-exchange binding works more or less the same  way as exchange-to-queue binding, the only significant difference from the surface is that both exchanges(source and destination) have to be specified.

Two major advantages of using exchange-exhhange bindings based on experience   and what I have found after doing some research are:

  •  Exchange-to-exchange bindings are much more flexible in terms of the topology  that you can design, promotes decoupling  & reduce binding churn
  • Exchange-to-exchange bindings are said to be very light weight and as a result help to increase performance *

Exchange-To-Exchange Topology

 Messaging Topology

Continue reading…

Introducing Ark Agent: Current & Historical Stock Market EOD Data Application

Ark Agent is an application used To Collect Current and Historical End Of Day Stock Data leverages Celery and MongoDB to provide end of day and historical market data for stocks.  It also uses finsymbols that I wrote a while back Finsymbols

Please see the nicely formatted documentation on GitHub for more details  Ark Agent Wiki  

Intro : Celery and MongoDB

This post serves as more of a tutorial to get a  Hello World up and running while using Celery and MongoDB as the broker . Celery has great documentation but they are  in snippets  across multiple pages and nothing that shows a full working example of  using Celery  with MongoDB which might be helpful for new users .

Install All Required Packages

Packages related to Celery.

pip install celery
pip install -U celery-with-mongodb

Ensure that you have MongoDB installed and running. 10gen have great documentation to get you up and running with MongoDB in  no time.

In this tutorial we will only use a single module  for both the celery application and the tasks.


1. Create

Lets start by first creating a that will store  configuration details that will be used to configure Celery to use MongoDb as the results backend and other settings.

2.Create Application and Worker)

 3. Now that we have our application and configs created lets  start it up!

celery -A tasks worker --loglevel=info

Screen Shot 2013-06-15 at 2.46.53 PM


4. Now lets create some work items/tasks to be processed.  You can easily create a script , but we used the CLI  in this example.

Celery Tasks

 5. Lets look on the console of the worker to see the jobs submitted.

As you can see tasks were obtained and processed

Screen Shot 2013-06-15 at 2.51.33 PM


6. If you are curious about your MongoDB backend . Lets take a look at the collections that will be used to store the results.

Screen Shot 2013-06-15 at 3.02.30 PM


Screen Shot 2013-06-15 at 3.03.01 PM

Thats how you get a simple Hello World up and running with MongoDB.  Will be doing a advance blog post where we use Celery and MongoDB with proper production configurations.




Obtain Finance Symbols for S&P 500,NASDAQ,AMEX,NYSE via Python

I recently completed  Computational Investing Part 1 which further tickled my curiosity. Before completing the course I  had started to play around with a few ideas.  To do any kind of analytics with finance relating to the stock market  you require symbols.

Created a simple module that uses BeatifulSoup  to parse the list of S & P 500 symbols from Wikipedia and nicely formats the data to be use programmatically an application.

You can grab the code here

Gist of it. Will return to you a list of all the symbols and related information


Stay tuned to see how we then use the list of symbols to obtain prices for each symbol




Ruby on Rails Note To Self – File Upload App

Wow, this article has been long overdue. I have been learning so much, but have not given this blog the priority it deserves. I refuse to use the excuse of “not enough time” 🙁

Even though , the full code is in my GitHub. This post is a reminder and a compilation of the resources , I used to complete my first webapp using Ruby on Rails.

The File Upload App currently allows a user to upload and save any file type, download the files belonging to that user and of course deleting the uploaded files if necessary. It also has a small admin mode, where the user with administrator privileges can see all files uploaded by all users and has the ability to delete both users and files uploaded.

I used the following to develop the complete WebApp

  1. 1. Twitter Bootstrap for UI etc.
  2. 2. Devise for authentication
  3. 3. Paperclip for handling uploads
  4. 4.Annotate for detailed model info


Listing All Users Registered Using Devise

Add the following to your users_controller to get all the users.

Continue reading…

Email Service For Your Web Application (Amazon SES)

Considering the wide variety of email services currently available and the task of keeping all the moving parts of a start up running. For sending emails why not use one of the services already available ? What did you say … you want to save money ? want do be super “lean” ? Well I wasted a lot of time trying to get my emails from being routed to the spam folder amongst other things.

  • Metrics around your email
  • One less infrastructure based service to maintain
  • Fewer rejects and as a result higher conversions

Continue reading…