Patent for License:

High Accuracy, Automated Data Capture and Intelligent Document Management System    

Advanced form recognition and processing technology to automatically process information in any format, including paper, PDFs, and other electronic images.


The system captures and atomizes 100% of the data from the original form and lets users store, process, recognize, compute and visualize all freed information via an intuitive web based interface. No manual handling is required, form recognition accuracy is 99+%, and administrators can specify access privileges and processing rules for each individual field.

The technology overcomes two major challenges of automated data extraction. The first is achieving high accuracy recognition despite the presence of document artifacts introduced by printing, photocopying, faxing, scanning, and other handling, which alter the pixel map in a form instance. This problem is solved through powerful on-the-fly mapping of structural elements to create dynamic templates. The solutions precise pixel mapping means that all data, including checkboxes, machine text, stamps, images, annotations, and handwriting, can be accurately extracted.

The second challenge is incorporating, with no interruption in processing, the introduction of new forms in which information fields are moved, altered, deleted, and added. The system addresses this challenge via genetic algorithms and artificial intelligence which map form distances to build semantic form relationships. Semantic mapping at the form and data element level allows business rule application to build meaning and relationships between forms and their elements, enabling full context-based searching.

The current 1.0 solution, designed to address the challenges described above, uses advanced form recognition, precise pixel mapping, and XML business tags to create full context based search, aggregation, and analytic capabilities for all data elements on a page.

The 2.0 solution, soon to be launched, contains a modular and integrated image processing and recognition platform that utilizes optical character recognition (OCR) for machine text recognition, optical mark recognition (OMR) for checkbox recognition and computation, and advanced intelligent character recognition (aICR) for simple hand stroke recognition. The company is developing offline cursive handwriting recognition (HWR) to enable effective word spotting search capability from handwritten notes. This platform enables content-within-context based search as well as providing high accuracy recognition for data conversion and computation.

Additional Information

Technology available for commercialization. Patent pending

Patent Summary

U.S. Patent Classes & Classifications Covered in this Patent:

Class 707: Data Processing:Database And File Management Or Data Structures

This is the generic class for data processing apparatus and corresponding methods for the retrieval of data stored in a database or as computer files. It provides for data processing means or steps for generic data, file and directory upkeeping, file naming, and file and database maintenance including integrity consideration, recovery, and versioning. There are three main divisions: 1. database and file accessing; 2. database schema and data structure; 3. file and database maintenance.

Subclass 102: Generating database or data structure (e.g., via user interface)