Why abstract encoding of labels does not help AI understand XBRL data, nor humans.
An article on Medium about why encoding data labels can introduce challenges for AI systems inspired this post. The notion that abstract encoding is bad for Large Language Models (LLMs), led to the thought that encoding must be unhelpful for human understanding for similar reasons.
While generating encoded labels is essential for various IT applications, it introduces special challenges for how XBRL collection systems operate, such as the European Banking Authority’s (EBA) reporting framework. The EBA uses both Data Point Methodology (DPM) and eXtensible Business Reporting Language (XBRL) in the implementation of its Credit Risk Directive (CRD) system.
The EBA generates the XBRL taxonomy from its internal DPM model and provides it to 27 European country authorities that collect the XBRL reports from thousands of banks. The move to a new, more compressed, collection format, XBRL-CSV, is an opportunity to consider how the XBRL model is generated and how it is understood by the banks that are required to report in the format. It is also a good time to review how the XBRL data is analysed to help the supervision of banks in Europe.
This article tries to evaluate the impact of the DPM encoding in the XBRL taxonomy and how it affects the communication of the reporting requirements, i.e., does it cause significant additional effort to implement and maintain the collection and reporting systems? It also reviews if DPM encoding also acts as a barrier to using advanced AI tools in the future to discover potentially useful information in the large datasets being collected.
The rest of this article can be read for free on Medium – here