When senior decision-makers are asked what they look for when evaluating document automation solutions, their answers tend to converge around three core questions:
- Can the offering solve my problem?
- How easy is it to use?
- Can we scale this solution to meet future needs?
These are subjective questions that are hard to answer. Understandably, they often get proxied by easily available quantitative data that vendors are only too happy to provide: model benchmarks, such as accuracy or F1 scores, processing times, up times, etc. While these are definitely useful metrics for understanding the immediate short-term efficacy of a solution, they provide little guidance around long-term utility and scalability.
In evaluating the long-term efficacy of a process automation solution, alternative considerations may be more useful.
Quality Awareness
Does the solution understand its own limitations?
In a situation in which the slightest mistake can result in millions, if not billions, in financial losses, model accuracy itself means nothing, if it’s short of 100%. At the same time, having every machine action double-checked by a human subject matter expert (SME) destroys any hopes of high rates of straight-through processing or meaningful gains from automation.
The ideal solution should have a strong sense of its own ability to deliver results in a variety of circumstances, and be able to judiciously escalate to human SMEs.
The cost of a mistake should be carefully balanced against the cost of human inspection, with a circumstance-based self-predicted quality score driving the decision to escalate to humans.
Deep Dive: Quality awareness is measured in errors. Simply put, Type 1 errors are mistakes that the system categorizes as correct, and therefore represent risk. Type 2 errors are correct data that the system flags as mistakes, and therefore represent a manual check. As the cost of a mistake goes up, the number of Type 1 errors must come down. In most financial services use cases, clients should get Type 1 errors as close to zero — even if it increases type 2 errors a little — to ensure that risk is minimized.
Context Awareness
Does the solution deeply understand the underlying context?
As the saying goes, an infinite number of monkeys, given an infinite amount of time in front of typewriters, will produce the complete works of Shakespeare. The artificial intelligence analogy would be to say that a high accuracy model that seems to solve the problem at hand can be produced with a large number of training examples and computational power.
However, such overengineered solutions fail to stand the test of time as real life instances start to drift away from historical examples, and retraining efforts fail to restore the solution to promised quality levels.
A well-designed product should attempt to understand things and not just strings.
Once armed with the ability to represent and persist contextual metadata, the solution’s failings over time become far more explainable, allowing for nuanced and careful retraining or reconfiguration. This allows the solution’s capabilities to be far more expandable beyond the original use case as business needs evolve with time.
Deep Dive: For example, a system that reads the words “John Smith” should know that this is not just text, but represents the name of a person, and would know if this person has been encountered by the system before. Further, it will enrich its knowledge of John Smith each time new information is acquired, and cross-check each reference to John Smith to be sure it is correct, if multiple John Smiths are recognized by the system.
Queryability
Beyond simplicity of use, can the solution intelligently guide users when things go wrong? Can the solution help decision-makers gain insight from data created and collected during process automation?
Having a simple and intuitive product that business users can start, with little or no training, is valuable. However, a more important criterion that is often overlooked is the solution’s response to failure. Operational errors can be expensive and high-stress affairs. It is imperative that failures be discovered quickly and that users are guided to the next course of action. Further, decision makers need to be able to easily access, analyze and glean insights from automation logs in order to be able to evolve operational processes with the ever-changing landscape of business.
Deep Dive: As an example, consider a situation in which regulatory changes require financial services firms to have insurance coverage for a new risk event. In such an event, one might seek to understand if vendor, supplier and counterparty contracts cover such risks or need modifications. This often needs a team of legal SMEs to manually review documents for days, if not weeks. Having someone crawl through thousands of contracts isn’t the answer. Absolutely necessary in today’s world is the ability to query documents with a system that leverages context and synonyms, and is able to combine qualitative, quantitative metadata and linked data from other documents.
Queryability calls for a risk management system that is able to understand and intelligently query contractual documents at scale to categorize and escalate noncompliant contracts.
What technology is able to automate in today’s day and age is astounding, only to be surpassed by what it promises to deliver tomorrow. Asking the right questions and evaluating solutions for long-term value creation can hugely help businesses gain an edge over their competitors in the coming decade.

A veteran of the financial services industry, Prashant Vijay is currently chief executive at Romulus, which specializes in building software products that automate document-heavy operations in the financial services industry. He has spent more than two decades working at the intersection of technology and data across multiple roles and geographies. His views are informed by his experience in tech and business roles at Goldman Sachs, and his sales and product and business management roles at IHS Markit.