A framework for intelligent document image enhancement
in pursuit of improved OCR performance
Ryno Kleinhans* and Stephan Nel,
University of Stellenbosch
SAMS Subject Classification Number: 12, 23, 25
A characteristic trait of the age of digitalisation is the ubiquitous transition from paper-reliant and manual-based business processes to fully digital, computer-assisted and automated versions thereof. Although many industries have already began with this transition away from paper documents, several real-world information chains are still intertwined with downstream paper-based systems. Some of these systems might require several decades to transition into a fully digital version thereof. Consequently, in order to fully automate these processes, the paper-based documents ought to be digitised.
Computerised approaches, e.g. optical character recognition engines, have achieved notable success in accurately extracting pixel-based information into machine-encoded information. The performance of these engines are, however, reliant on the quality of the captured document images. Although there are a plethora of image enhancement techniques designed to increase image quality, the implementation of some of these techniques involves a large degree of dependency on human cognition as each document image require a unique set of preprocessing steps. Accordingly, the application of data-driven approaches from the realm of machine learning — more specifically, deep learning — certainly warrants consideration within the presented context.
In this presentation, a high-level overview is provided of a framework that aims to facilitate the text extraction procedure of document images by automating the preprocessing stage through means of intelligently identifying which combination of image enhancement techniques to implement in respect of individual images. Powerful approaches from the domain of computer vision, together with the implementation of transfer learning, are considered.