HTML to PDF transformation with wkhtmltopdf

Introduction

In some Alfresco implementations, you have to generate documents from metadata. You have a lot of options depending of constraints of the projects from text document to PDF. In my case, I had to generate a printable and exportable document from metadata. So, my idea is to generate an HTML file and to use the transformer from HTML to PDF in Alfresco. In my case, I have to create a document sexy and easy to read. It implies the use of CSS files and here is the result:

The result is definetely not acceptable, so I started to search a new transformer. And I discovered the tool wkhtml2pdf. It is a simple shell utility to convert HTML to PDF using the Webkit rendering engine present in QT 4.8. After some tests locally, I decide to include it in Alfresco. Moreover, this tool is available on MacOS, Linux and Windows.

Creation of the transformer

So, I just create a file transformer-context.xml in the folder alfresco/WEB-INF/classes/alfresco/extension. The first thing to do is to create the transformer:

<bean id="transformer.worker.Html2pdf" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">
	<property name="mimetypeService">
		<ref bean="mimetypeService" />
	</property>
	<property name="checkCommand">
		<bean class="org.alfresco.util.exec.RuntimeExec">
			<property name="commandMap">
				<map>
					<entry key=".*">
						<value>${wkhtmltopdf.exe} -V</value>
					</entry>
				</map>
			</property>
			<property name="errorCodes">
				<value>1</value>
			</property>
		</bean>
	</property>
	<property name="transformCommand">
		<bean class="org.alfresco.util.exec.RuntimeExec">
			<property name="commandMap">
				<map>
					<entry key=".*">
						<value>${wkhtmltopdf.exe} ${source} ${target}</value>
					</entry>
				</map>
			</property>
			<property name="errorCodes">
				<value>1</value>
			</property>
		</bean>
	</property>
	<property name="explicitTransformations">
		<list>
			<bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
				<property name="sourceMimetype">
					<value>text/html</value>
				</property>
				<property name="targetMimetype">
					<value>application/pdf</value>
				</property>
			</bean>
		</list>
	</property>
</bean>

<bean id="transformer.html2pdf" class="org.alfresco.repo.content.transform.ProxyContentTransformer" parent="baseContentTransformer">
	<property name="worker">
		<ref bean="transformer.worker.Html2pdf" />
	</property>
</bean>

 The original transformer used in Alfresco is OpenOffice. So, we need to re-define the relevant bean and add at the end a block unsupportedTransformations. So, this transformer will not be used anymore to transform from HTML to PDF.

<bean id="transformer.JodConverter.Html2Pdf" class="org.alfresco.repo.content.transform.ComplexContentTransformer" parent="baseComplexContentTransformer">
	<property name="transformers">
		<list>
			<ref bean="transformer.JodConverter" />
			<ref bean="transformer.JodConverter" />
		</list>
	</property>
	<property name="intermediateMimetypes">
		<list>
			<value>application/vnd.oasis.opendocument.text</value>
		</list>
	</property>
	<property name="supportedTransformations">
		<list>
			<bean class="org.alfresco.repo.content.transform.SupportedTransformation">
				<property name="sourceMimetype">
					<value>text/html</value>
				</property>
				<property name="targetMimetype">
					<value>application/pdf</value>
				</property>
			</bean>
		</list>
	</property>
	<property name="explicitTransformations">
		<list>
			<bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
				<property name="sourceMimetype">
					<value>text/html</value>
				</property>
				<property name="targetMimetype">
					<value>application/pdf</value>
				</property>
			</bean>
		</list>
	</property>
	<property name="unsupportedTransformations">
		<list>
			<bean class="org.alfresco.repo.content.transform.SupportedTransformation">
				<property name="sourceMimetype">
					<value>text/html</value>
				</property>
				<property name="targetMimetype">
					<value>application/pdf</value>
				</property>
			</bean>
		</list>
	</property>
</bean>

And finally, we override the transformer to re-use our new transformer.

<bean id="transformer.JodConverter.2Pdf" class="org.alfresco.repo.content.transform.FailoverContentTransformer" parent="unregisteredBaseContentTransformer">
	<property name="transformers">
		<list>
			<ref bean="transformer.JodConverter" />
			<ref bean="transformer.html2pdf" />
		</list>
	</property>
	<property name="supportedTransformations">
		<list>
			<bean class="org.alfresco.repo.content.transform.SupportedTransformation">
				<property name="targetMimetype">
					<value>application/pdf</value>
				</property>
			</bean>
		</list>
	</property>
</bean>

Conclusion

The result is quite amazing... The SWF preview looks like exactly like the HTML page. So, in our case, we used a Freemarker template to generate HTML content from meta-data and, next, generate the PDF from the HTML content.

More information about wkhtml2pdf:

Comments

wkhtmltopdf is a nice utility - we use it to convert the HTML body of RFC822 emails. What is your experience regarding the quality of images in the rendered PDF? There is some improvement to be made from the experience we had with the tool.

No complain about the quality! But our real example contains only a logo on the top left corner, and "check" icons in each row. So, not really relevant to test the quality of images. It depends probably of the version, I checked this morning and they just released a new version 4 days ago.

Failed to start a runtime executable content transformer:
Execution result:
os: Windows NT (unknown)
command: ${wkhtmltopdf.exe} -V
succeeded: false
exit code: 1
out:
err: Cannot run program "${wkhtmltopdf.exe}": CreateProcess error=2, The system cannot find the file specified

Hi, you didn't probably specified the property wkhtmltopdf.exe in your alfresco-global.properties file. Just add this property using the value of the executable and it should work.

Hi
Great article.
I created a transformer for PDF -> HTML using pdf2htmlEX tool for conversion. The transform command always fails and debugging seems to indicate that at the time of transformation the source/target file doesn't exist because copying and executing the same exact command in terminal succeeds. I'm just wondering whether you came across this issue before.

Hi,

Can you give me the value of the property "wkhtmltopdf.exe" that you configured? And did you enable the debug level for the transformation class in Alfresco?

log4j.logger.org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker=DEBUG
log4j.logger.org.alfresco.util.exec.RuntimeExec=DEBUG
log4j.logger.org.alfresco.repo.content.transform.TransformerDebug=DEBUG

Add new comment