Handling Large Streams of Data through HTTP with JBoss Fuse

Handling Large Streams of Data through HTTP with JBoss Fuse

February 13, 2017 ( last updated : February 13, 2017 )
jboss-fuse integration development data api camel

https://github.com/alainpham/large-http-streams


Abstract

Service and api platforms are mostly real time oriented and handle small amounts of data that can be easily processed in memory. But for many legacy purposes and for content management systems, being able to handle large sets of data is a very common requirement. This article shows how to easily send and receive large data files through HTTP with JBoss Fuse using streams. The main objective of being able to use streams is to avoid running into out of memory issues.

JBoss Fuse and its streaming capabilities

As a beginner at JBoss Fuse (particularly camel), you might have faced a few issues when the content of the message body was a stream. If the body needs to be read more than once, it will result in an exception at the second read attempt. This is obviously because by default a stream can only be read once. It can be quite confusing at the beginning but this gives such great flexibility as to how to process the content. The customizations for handling stream is powerful and allows advanced tuning. As an example, this is what could be done with a stream :

Streaming data with camel-jetty, camel-http4 and file components

Components such as the camel-jetty, the camel-http4 and the file components that we will use here, do not load the entire payload into memory for processing unless we force it explicitly. Instead they will set the message body to a stream object (a pointer) that can be passed through the camel route and be read only needed. You can find an implementation example using these components here :
https://github.com/alainpham/large-http-streams

Below is the workflow implemented in the code example. The content of the input file is never loaded entirely into memory.

stream files through http with Fuse
<camelContext xmlns="http://camel.apache.org/schema/spring">
	<route>
		<from uri="file:input?include=.*ready&amp;move=done" />
		<setHeader headerName="CamelHttpMethod">
			<constant>PUT</constant>
		</setHeader>
		<to uri="http4:localhost:8123"></to>
		<log message="${body}"></log>
	</route>
	<route>
		<from uri="jetty:http://0.0.0.0:8123?disableStreamCache=true"/>
		<log message="Handling a stream of class : ${body.class}"></log>
		<log message="${headers}"></log>
		<to uri="file:out"></to>
		<setBody>
			<constant>DONE</constant>
		</setBody>
	</route>
</camelContext>

We are streaming through the initial file from end to end. In between the streams is a wire-protocol which is HTTP. In order to verify that we are really streaming and not loading everything into memory, we can proceed as follows :

Otherwise, it will fail with an out of memory exception if at some point we try to convert the body to a String (i.e after the consumer camel-jetty), Note also that there are some implicit type conversions such as from a File type to an output stream for the camel-http4 component.

The nice thing with JBoss Fuse is that although the task to handle streams is quite advanced, the code stays very simple and maintainable compared to custom low level Java Code. This is thanks to the reusability of the frameworks components and the smart implicit type conversions built into camel. We could now easily tweak this example to add complex patterns such as stream-parsing XML data (i.e with XTokenizer) and route chunks towards other processing units. These units may run in multiple threads or engines for parallel processing and become therefore naturally scalable.

stream xml files through http

Originally published February 13, 2017
Latest update February 13, 2017

Related posts :