Artificial intelligence and the analysis of text data — how NLP helps impact investors

Investment researchers use natural language processing to analyse large amounts of data to improve value to stakeholders

5 min readJun 30, 2021

Artificial intelligence has transformed impact analysis and reporting. AI-based applications enable analysts to:

collect, clean and process large amounts of financial, ESG and other data using automated programmes and algorithms, thereby freeing up human resources to conduct more nuanced tasks
analyse and present data in order to provide relevant insights and reports to various stakeholders like investors, communities and government authorities
create machine learning algorithms that can predict future outcomes in order to aid planning, make forecasts and identify opportunities

With extensive datasets available and massive computing power, there are also misconceptions relating to what values are relevant for impact investing. The result is that critical information may be overlooked.

One of the myths relating to investment decisions is that only numerical data can be analysed using artificial intelligence.

What data can be measured? Numbers vs text

When it comes to collecting and cleaning data, much emphasis is placed on ensuring that categorical text descriptions are translated into numerical values to enable machine learning algorithms to weave their magic. In addition, the perception is that quantitative data is superior to qualitative data because the latter is subjective and cannot be measured.

The reality is that information in the form of text is far from useless or irrelevant to impact measurement. In fact, natural language processing (NLP) has emerged as a critical AI tool for impact investing, especially when conducting due diligence audits on corporations and other organisations.

What is natural language processing?

“Natural language processing (NLP) refers to the branch of computer science — and more specifically, the branch of artificial intelligence or AI — concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.” — IBM

It entails programming a machine to search for words that convey a certain tone or sentiment, to recognise and transcribe voice data into text that can be further analysed, to verify supporting information from a variety of electronic sources, and a number of other tasks that humans carry out when we process text or speech in our everyday lives.

How does NLP help impact investors?

Financial data does not provide a complete picture when investing for environmental, social or governance impact. An increasing number of investment researchers and institutional lenders are analysing text and speech data to assess the value and risks associated with potential investments.

1. Reputation and public sentiment

Public sentiment can be gauged by analysing tone in speech and text of news reports, social media posts and other online sources. What is significant is that these sources of information are usually not within the control of the potential investee company, in contrast to much of the documentation that is provided to investors during the proposal process.

A high incidence of comments that convey negative sentiments or opinions about a company can be an indication that its social licence to operate or its reputation is at risk.

2. Due diligence and document review

The due diligence process requires the reading and review of a large number of documents and other records, Institutional investors typically hire teams of lawyers, accountants and other risk professionals for this purpose. In addition to the monetary cost, the due diligence process can take weeks or months to complete — resulting in opportunity cost that is difficult to even measure.

The banks that finance investments also need to process, review and verify stacks of application documents and information. When JP Morgan Chase realised that this process of document review amounted to about 360,000 hours of work every year, they created an NLP system that not only cut the these hours down to a few seconds, but also almost completely eradicated clerical errors.

3. Risk of future liabilities

Potential future liabilities may also be highlighted through text analysis. For example, if a company issues a press release about an award it has won for sustainability, and a large proportion of social media responses refer to its practice of dumping toxic waste that contaminates the drinking water of local communities, a potential investor should view this as a red flag.

4. Customer satisfaction

Customer satisfaction can be assessed from online reviews, emails and other correspondence. Negative words that evidence a high degree of complaints or dissatisfaction with a product or service delivery can help to highlight that further investigation is required regarding how this may affect the sustainability of the corporation.

5. Fraud detection and governance risk

Statistically significant patterns of words that indicate employee or customer dishonesty (for example, fraudulent activity or money laundering) can be identified through topic modelling.

A popular case study in data science studies is the Enron Corpus (a database of over 600,000 emails of Enron employees from the period leading up to the organisation’s collapse). It is fascinating to see the variety of projects relating to NLP research and machine learning modelling that have been inspired by the availability of this dataset.

6. Information security risk

The risk of an organisation’s employees falling victim to scams and phishing can be gleaned through the incidence or patterns of words and phrases that are commonly used by scammers. This can also help detect weaknesses in the information security system.

Further use cases of NLP in information security include identification of malicious domains visited by employees, vulnerable segments or patterns in source code, and other vulnerabilities in an organisation’s system infrastructure.

Any shortcomings highlighted during these processes, enables potential investors to assess future risk and to estimate more appropriate pricing for investments, taking into account the cost of potentially replacing or upgrading the information security system.

7. Overall sentiment analysis

More generally, market intelligence data can be mined from online sources like news items, industry reports, and information contained in repositories. Current market conditions and trends can be inferred from disproportionate usage of words indicating either positive or negative sentiments. This enables investors to identify potential risks and opportunities in particular industries or markets.

Conclusion

As humans, we make inferences and draw conclusions from everyday language that inform our decision-making. Natural language processing enables machines to be programmed to imitate many aspects of these processes. This is particularly valuable for impact investing, where financial data alone does not provide enough information to adequately assess the risks and viability of potential investments.