Ephesoft: Linux, Nuance and MongoDB
I'm writing this new blog post to share with you 2 things: (1) My first experience with Ephesoft on Linux, and (2) how I integrated Ephesoft with the NoSQL database MongoDB.
Ephesoft installation on Linux
I tried to install Ephesoft linux with the first release candidates few months ago. And it was not a big success for me. So, I decided to wait until the release of the first official Linux version few weeks ago. And I can say that I was really surprised. I just setup a standard Ubuntu Desktop virtual machine. The installation was really easy. Ephesoft provides an installer that takes care of everything including dependencies. In less than 10 minutes, my server was up and running. The Ephesoft team did a really good job!
In terms of performance, I didn't really spend time to compare my Windows VM and my Ubuntu VM. But I have the feeling that the same batch instance was processed more quickly on my Ubuntu VM. It would be really interesting to compare performances on similar VM with the same batch class.
Batch Class Creation
For this NoSQL use case, I wanted to create a new batch class from scratch on my Linux server. My goals were:
- To create a batch class from scratch
- To deactivate the review and the validation steps
- To configure this batch class to use fixed form extraction on I9 form. The Employment Eligibility Verification Form I-9 is a U.S. Citizenship and Immigration Services form. It is used by an employer to verify an employee's identity and to establish that the worker is eligible to accept employment in the United States. You can download a sample here: http://www.uscis.gov/sites/default/files/files/form/i-9.pdf
- To export all extracted fields, and as well PNG files created during the Ephesoft process in MongoDB.
The first and second goals are quite standard in Ephesoft, and you need to achieve the same steps in Ephesoft for Windows or Linux. I'm not going to describe them.
Fixed form extraction
When you use fixed form extraction, you need to configure Ephesoft to specify where are located information in the document. Basically, with Ephesoft on Windows, you use Recostar. You create a new Recostar project, and you configure this project to find all information. When this project is created, you just assign it to your batch class using the web user interface.
For Ephesoft on Linux, you can't use Recostar. Ephesoft uses an other tool called Nuance. The first thing to do is to find a tool to create Nuance file. The one recommended by Ephesoft is OmniPage Ultimate (http://www.nuance.com/for-business/by-product/omnipage/ultimate/index.htm). And there is a quick documentation explaining how to use it: http://wiki.ephesoft.com/createzonnuance. I got 2 surprises. The first one is that OmniPage is not available on Linux, or on MacOS. So I had to install an other Windows VM (with WindowsXP) to be able to create my Nuance template. The other surprise is that OmniPage Ultimate is not free, and it's almost 500$ to buy it. I searched online if there is other alternative but I didn't found any. Let me know if you know any other tools.
For this last goal, I wanted to demonstrate that Ephesoft is a very powerful tool. Using the concept of plugins, I just created a new one (following rules described in my previous posts) to export documents into MongoDB. I'm not going to explain deeply how it's done, but I just followed the same steps as the DropBox export plugin. Using the MongoDB Java driver, I'm able to send all fields, and as well the thumbnail PNG file and the original PNG file generated by Ephesoft for the UI.
Here is the interesting piece of my sample document:
And here is what you can find in your Mongo database at the end of the export:
In parallel of that, I just wanted to create a quick document browser based on Node.js connected to the same MongoDB instance. My requirements were simple:
- Be able to search documents by any metadata
- Be able to view details of a document
- Be able to see the document in it's original size
- Be able to zoom in the document
I was able to achieve that really quickly using Express. Here is a quick demonstration of the interface: