At 53:09 in Episode #158 of the All-In podcast (clip attached), Chamath Palihapitiya asks for a research service exactly like New Constructs. Specifically, Mr. Palihapitiya calls for “AI that crawls 10-Ks and 10-Qs to generate statistical measurements of all public companies”. It’s uncanny how well he describes our value proposition. 

Watch the clip

We think the All-In crew (Chamath Palihapitiya, Jason Calacanis, David Sacks, and David Friedberg) would like to know that the service they ask for exists and that it produces proven superior research.

The need for our exact technology is underscored by recent findings that other large language models are unable to analyze and interpret SEC filings. A new study from Patronus AI led to these headlines:

  • “A startup tested if ChatGPT and other AI chatbots could understand SEC filings. They failed about 70% of the time and only succeeded if told exactly where to look”Fortune
  • “GPT and other AI models can’t analyze an SEC filing”CNBC
  • “SEC Filings Are So Complicated Even AI Is Baffled"TheMessenger Tech
  • “ChatGPT and other AI models unable to analyze SEC Filings”Tech Startups

This CNBC article details more failings of other AI trying to read SEC filings (bold emphasis added):

  • GPT-4-Turbo failed at the startup’s [Patronus AI] “closed book” test, where it wasn’t given access to any SEC source document. It failed to answer 88% of the 150 questions it was asked, and only produced a correct answer 14 times. It was able to improve significantly when given access to the underlying filings. In “Oracle” mode, where it was pointed to the exact text for the answer, GPT-4-Turbo answered the question correctly 85% of the time, but still produced an incorrect answer 15% of the time.
    • CNBC notes that “oracle mode” is an unrealistic test because it requires human input to find the exact pertinent place in the filing — the exact task that many hope that language models can address.
  • Llama 2, an open-source AI model developed by Meta, had some of the worst “hallucinations,” producing wrong answers as much as 70% of the time, and correct answers only 19% of the time, when given access to an array of underlying documents.
  • Anthropic’s Claude 2 performed well when given “long context,” where nearly the entire relevant SEC filing was included along with the question. It could answer 75% of the questions it was posed, gave the wrong answer for 21%, and failed to answer only 3%. GPT-4-Turbo also did well with long context, answering 79% of the questions correctly, and giving the wrong answer for 17% of them.

Our Robo-Analyst AI, powered by the proprietary training dataset we’ve been building for over 20 years,  solves these problems. In addition, it parses important information from the footnotes and MD&A to produce materially superior fundamental datasets, models and analytics and stock ratings.

Not a client of New Constructs, yet? Learn more.

This article was originally published on January 4, 2024.

Disclosure: David Trainer, Kyle Guske II, Italo Mendonca, and Hakan Salt receive no compensation to write about any specific stock, style, or theme.

Questions on this report or others? Join our Society of Intelligent Investors and connect with us directly.

Click here to download a PDF of this report.

Leave a Reply

Your email address will not be published.