Ephesoft: Linux, Nuance and MongoDB

I'm writing this new blog post to share with you 2 things: (1) My first experience with Ephesoft on Linux, and (2) how I integrated Ephesoft with the NoSQL database MongoDB.

Ephesoft installation on Linux

I tried to install Ephesoft linux with the first release candidates few months ago. And it was not a big success for me. So, I decided to wait until the release of the first official Linux version few weeks ago. And I can say that I was really surprised. I just setup a standard Ubuntu Desktop virtual machine. The installation was really easy. Ephesoft provides an installer that takes care of everything including dependencies. In less than 10 minutes, my server was up and running. The Ephesoft team did a really good job!

In terms of performance, I didn't really spend time to compare my Windows VM and my Ubuntu VM. But I have the feeling that the same batch instance was processed more quickly on my Ubuntu VM. It would be really interesting to compare performances on similar VM with the same batch class.

Batch Class Creation

For this NoSQL use case, I wanted to create a new batch class from scratch on my Linux server. My goals were:

  1. To create a batch class from scratch
  2. To deactivate the review and the validation steps
  3. To configure this batch class to use fixed form extraction on I9 form. The Employment Eligibility Verification Form I-9 is a U.S. Citizenship and Immigration Services form. It is used by an employer to verify an employee's identity and to establish that the worker is eligible to accept employment in the United States. You can download a sample here: http://www.uscis.gov/sites/default/files/files/form/i-9.pdf
  4. To export all extracted fields, and as well PNG files created during the Ephesoft process in MongoDB.

The first and second goals are quite standard in Ephesoft, and you need to achieve the same steps in Ephesoft for Windows or Linux. I'm not going to describe them.

Fixed form extraction

When you use fixed form extraction, you need to configure Ephesoft to specify where are located information in the document. Basically, with Ephesoft on Windows, you use Recostar. You create a new Recostar project, and you configure this project to find all information. When this project is created, you just assign it to your batch class using the web user interface.

For Ephesoft on Linux, you can't use Recostar. Ephesoft uses an other tool called Nuance. The first thing to do is to find a tool to create Nuance file. The one recommended by Ephesoft is OmniPage Ultimate (http://www.nuance.com/for-business/by-product/omnipage/ultimate/index.htm). And there is a quick documentation explaining how to use it: http://wiki.ephesoft.com/createzonnuance. I got 2 surprises. The first one is that OmniPage is not available on Linux, or on MacOS. So I had to install an other Windows VM (with WindowsXP) to be able to create my Nuance template. The other surprise is that OmniPage Ultimate is not free, and it's almost 500$ to buy it. I searched online if there is other alternative but I didn't found any. Let me know if you know any other tools.

MongoDB plugin

For this last goal, I wanted to demonstrate that Ephesoft is a very powerful tool. Using the concept of plugins, I just created a new one (following rules described in my previous posts) to export documents into MongoDB. I'm not going to explain deeply how it's done, but I just followed the same steps as the DropBox export plugin. Using the MongoDB Java driver, I'm able to send all fields, and as well the thumbnail PNG file and the original PNG file generated by Ephesoft for the UI.

Here is the interesting piece of my sample document:

And here is what you can find in your Mongo database at the end of the export: 

Node.js application

In parallel of that, I just wanted to create a quick document browser based on Node.js connected to the same MongoDB instance. My requirements were simple:

  • Be able to search documents by any metadata
  • Be able to view details of a document
  • Be able to see the document in it's original size
  • Be able to zoom in the document

I was able to achieve that really quickly using Express. Here is a quick demonstration of the interface:

 

Comments

This is a great blog, Ben!

Hi,
I just saw your post and I'm very interested on replicating your use case.
We are working on an Enterprise Content Manager and currently have MongoDB as our document index & search engine.
Are you part of Ephesoft team?
Best regards,

Hi,

No, I'm not part of the Ephesoft team, but let me know if I can be in any help by sending us a message.

Thanks

Ben

Hello,

I foud this blog entry looking over the internet, I have succesfuly installed ephesoft 4.0.2.0 on ubuntu trusty 14.04 but it throws error when trying to perform a batch job on a PDF document, the error appears just right when I'm trying to map the value I upload the PDF document and It says something that "there was an error uploading the document" or something like that, when I look the logs I find the cause of the error and is related to GhostScript command, apparently when It tries to convert the document into png there an error with the command.

Have you experienced the same behavior ?? or you have previos version installed?

Thanks in advance.

Hi,

So GhostScript is installed directly by Ephesoft, so the installed version should be compatible with Ephesoft. Can you send me the trace of the error ?

On the Ephesoft wiki, there is already some information about how to resolve issues related to Ghostscript, maybe it can help you: http://wiki.ephesoft.com/category/knowledge-base/ghostscript-knowledge-base

 

Hello,

right after I upload the pdf file there a message saying: "error uploading the file, please try again.

The trace says the following:

"4.0.2.0-COMMUNITY-RELEASE Linux 2015-09-09 01:24:53,750 ERROR pool-2-thread-1 com.ephesoft.dcma.core.threadpool.EphesoftProcessExecutor - The command: "[gs, -r300, -sDEVICE=pngalpha, -dBATCH, <96>dNOPAUSE, -sOutputFile=/opt/Ephesoft/SharedFolders/BC4/test-advanced-extraction/test/sample2.pdf-%04d.png, /opt/Ephesoft/SharedFolders/BC4/test-advanced-extraction/test/sample2.pdf] executing in the working directory: "/usr/local/bin" failed to execute successfully due to: Non-zero exit value failure"

In the first place I think it was related to GhostScript but so far I haven't done further research about it.

BTW: Is the CMIS export module available for 4.0.2.0 CE version ?

I've found the cause of the error with GhostScript when uploading pdf file to the batch class job...

As I said the error is produced when I try to upload a PDF in the extraction rule view, the trace is:

"com.ephesoft.dcma.core.exception.DCMAApplicationException: The command: "[gs, -r300, -sDEVICE=pngalpha, -dBATCH, –<96>dNOPAUSE, -sOutputFile=/opt/Ephesoft/SharedFolders/BC4/test-advanced-extraction/Sample/sample2.pdf-%04d.png, /opt/Ephesoft/SharedFolders/BC4/test-advanced-extraction/Sample/sample2.pdf] executing in the working directory: "/usr/local/bin" failed to execute successfully due to: Non-zero exit value failure"

The cause (at least I guess this is the cause) is that "<96>" character added to the command (I don't know why It is doing this yet). I try the same command but fixing the -dNOPAUSE and it worked...

I'll have to download the source code debug it and build again... I don't know if this is a heavy task.. It would be great hear suggestion from you guys.

Thanks in advance.

finally I found the errors, to tell the truth, Ephesoft CE has disapointed me, It has a lot of errors, for example:

I downloaded the CE for linux 4.0.2.0 and I had to do a couple of things to get it up and running:

1) find out what was going on with the ghostscript command. When I upload a file to test the extraction it wasn't showing the image cause of an error, and that error was related to gs command, actually the -dPAUSE arg was corrupted with <96> character, <96>dPAUSE and that was causing the error, afterwards the file was getting uploaded successfuly but.... then, I tried to test the extraction using (PNG, PDF, TIFF, TIF) and just wasn't possible, the key:value frame was getting lost on the screen every time I tried to perform the extraction, without success... so my conclution:

Ephesoft CE:

1) The installer doesn't work.
2) The extraction feature doesn't work.
3) I guess that CIMS feature was removed from Community Edition at some point (the documentation is deprecated).
4) Overall, Ephesoft documentation is deprecated.
5) If I'm not wrong, the last thread posted on the forum is from few months ago, so, community is not that active.

I don't know if there is any initiative related to CE maybe there is a repository (maybe github) to contribute with the community and make ephesoft CE better, so far you can transfer the source code to personal github account, nonetheless I don't know what is this version...

Regards.

Hi Rafael,
Did you get your issue resolved. I seem to be having the same issue: The command: "[gs, -q, -dNODISPLAY, -P-, -dSAFER, -dDELAYSAFER, --pdfopt.ps, /opt/Ephesoft/SharedFolders/ephesoft-system-folder/BI4C/tempfile_BI4C_documentDOC1.pdf, /opt/Ephesoft/SharedFolders/ephesoft-system-folder/BI4C/BI4C_documentDOC1.pdf] executing in the working directory: "/usr/local/bin" failed to execute successfully due to: Non-zero exit value failure. I'm unsure how to troubleshoot as the community is not active. Any help would be appreciated.

Hello there, I see people installing ephesoft on linux.... makes me feel like I incompetent at all. I have laptop (for testing) powerful enough with fresh Ubuntu 16 desktop installed nothing else. What I have found on how to install is just go and install ephesoft (version 4.0.2 community); I follow instructions and is crashes on Libre, trow error and exiting installation. Can you please write what is need to be done on Ubuntu 16 before install ephesoft? DO I need to install servers etc. Would be perfect for dummy like me, step by step. Thank you so much and look forward for help.

I just created an other blog that lists the steps to install Alfresco Community on Centos 6.6 here: http://www.bataon.com/blogs/bchevallereau/centos-66-ephesoft

Add new comment