Is Automatic XBRL Tagging Feasible Using AI and LLM systems?
Research by Patronus AI has highlighted apparent challenges faced by large language models (LLMs), such as OpenAI’s GPT-4, in analysing financial data contained in US Securities and Exchange Commission (SEC) filings. The study found that even with access to extensive filings, the best-performing model at the time, GPT-4-Turbo, achieved only a 79% accuracy rate. XBRL International (XII) was surprised to find (…and so were the authors of this article) that they had not used the XBRL data tags available for these reports in the analysis.
Further research by XBRL International (XII) showed that “AI systems like OpenAI’s GPT-4 demonstrate improved performance in answering financial queries when fed with structured xBRL-JSON files converted from the SEC’s 10K Inline XBRL reports”. Like XII, we find this an obvious result, i.e., that using the semantic tags provided by companies of their own financial data would produce better results and that structured data can provide significant benefits to financial disclosure analysis.
However, what if you reversed the process and ask AI and LLMs to tag a financial report with XBRL?
UBPartner has been undertaking some fundamental research on using natural language processing to identify key information in an annual report. Below is a summary of the initial UBPartner results, plus a look at what the latest developments in LLMs might offer to improve the performance, and an initial view on where this is heading. This is part of a series of articles looking into key issues around Digital Reporting using XBRL.
The rest of this article can be read on Medium for free – here