I had to quickly develop a model that will be able to categorize articles across a wide range of financial topics. Developing the corpus manually was not an option and writing custom crawlers for specific news sites would be a tedious process.
Having had to create a different corpus previously and lacking such a financial corpus, I created News Corpus Builder to allow myself and others to be able to generate various corpora about any particular topic/s.
Google News is used as the source to obtain articles. Despite the limitation of only 100 articles per search term by Google, you are able to build a large corpus by using multiple or similar words to retrieve articles per topic. Alternatively you could just run it daily to increase the size of your corpus.