To build real artificial intelligence (AI), high-quality training data is arguably the single most important ingredient. As computing power becomes more of a commodity with each passing day, AI-focused firms, like Anthropic, Microsoft (MSFT), Alphabet (GOOG), Broadcom (AVGO), Oracle (ORCL), and Amazon (AMZN), are shifting their focus from building datacenters and buying as many Nvidia (NVDA) chips as possible. Instead, they are looking for truly reliable, high-quality datasets to power their AI’s ability to learn, reason, and perform with anything close to human expertise.

Garbage in, garbage out. The same old saying from my early days coding in BASIC applies just as well to AI today. AI models require well-structured, relevant, and accurate data to achieve meaningful results, especially in complex fields like finance and investing.

As the critical role of high-quality training data becomes more widely recognized, AI firms are increasingly prioritizing investments in data acquisition and curation. However, these kinds of high-quality datasets are rare and hard to find. So much so, that we see companies taking rather extreme measure to get or create them.

For example, take Anthropic’s approach to training its AI, Claude. Initially, Anthropic relied on broad internet data. Eventually, the company transitioned to more structured and vetted sources, such as newspapers, and most recently, books. Anthropic reportedly purchased millions of print books, digitized them, and used that content as a more refined and consistent training foundation for its models.

One critical question remains though – how do we imbue machines with the experience and expertise not found in books so they can perform tasks with comparable skill to human experts?

The answer lies in two foundational elements of data science: taxonomy and ontology.

In the context of an investing AI agent, taxonomy refers to the rigorous organization and classification of financial data. Excelling in this process for the investing business requires going beyond surface-level metrics and building advanced financial models that incorporate detailed, often-overlooked disclosures, such as those found in footnotes and MD&A sections of filings, as we’ve pointed out many times.

These sections frequently contain crucial information about subtle and nuanced accounting anomalies, off-balance-sheet items, and forward-looking risks that are essential to understanding a company’s true financial condition.

Ontology refers to the structure that allows one to derive meaning from the data. It is the process of mapping the taxonomized data into relationships and patterns that can inform decision-making. For the investing business, the ideal ontology is one that can produce idiosyncratic alpha.

For ontologies to work, they need accurate and consistent data. No ontology, nor any model for that matter, can produce logical and meaningful signals if the underlying data is poor.

Accordingly, we have gone to great effort to build technology that gathers proven-superior data from financial filings to train and power our AI-Agent, the Robo-Analyst.

We have unrivaled experience and success in endowing machines with the subject matter expertise needed to perform like human experts. Want proof. First, this paper from the Harvard Business School and MIT Sloan professors empirically proves the idiosyncratic alpha in our proprietary measure of Core Earnings.

Second, the indices based on our research and managed by Bloomberg, strongly outperformed the S&P 500 in 1H25 and over the last five years. See Figures 1, 2, and 3.

  1. Bloomberg New Constructs Ratings VA-1 Index (ticker: BNCVA1T:IND)
  2. Bloomberg New Constructs Core Earnings Leaders Index (ticker: BCORET:IND)
  3. Bloomberg New Constructs 500 Index (ticker: B500NCT:IND)

The “Very Attractive Stocks” Index beat the S&P 500 by over 62% over the last five years. Bloomberg’s official name for the index is Bloomberg New Constructs Ratings VA-1Index (ticker: BNCVA1T:IND). Figure 1 shows it was up 158% while the S&P 500 was up 96%.

Figure 1: Very Attractive-Rated Stocks Index Strongly Outperforms the S&P 500 Over the Last 5 Years

Sources: Bloomberg as of July 18, 2025
Note: Past performance is no guarantee of future results.

The Bloomberg New Constructs Core Earnings Leaders Index beat the S&P 500 by 45% over the last five years. Our Index (ticker: BCORET:IND) was up 140% while the S&P 500 was up 96%. See Figure 2 for details.

Figure 2: Bloomberg New Constructs Core Earnings Leaders Index Outperforms the S&P 500 Over 5 Years

Sources: Bloomberg as of July 18, 2025
Note: Past performance is no guarantee of future results.

Our “Core-Earnings Weighted S&P 500” Index beat the S&P 500 by over 31% over the last five years. Bloomberg’s official name for the index is Bloomberg New Constructs 500 Total Return Index (ticker: B500NCT:IND). Figure 3 shows it was up 128% while the S&P 500 was up 96%.

Figure 3: Bloomberg New Constructs 500 Index Strongly Outperforms the S&P 500 Over the Last 5 Years

Sources: Bloomberg as of July 18, 2025
Note: Past performance is no guarantee of future results.

Note that these indices are not available to the public. The only way to build strategies that achieve this kind of outperformance based on superior fundamental data is to be a New Constructs member.

Want to leverage our proven superior Robo-Analyst that can generate alpha in any market?

Learn more about our memberships here.

This article was originally published on July 21, 2025.

Disclosure: David Trainer, Kyle Guske II, and Hakan Salt receive no compensation to write about any specific stock, style, or theme.

Questions on this report or others? Join our online community and connect with us directly.

Click here to download a PDF of this report.