Wednesday, September 11, 2013

Taste of Apache Camel: Part IV - Solr

In previous post we read file with some movie titles and actors playing in these movies and printed everything on console. But more often than not, received data should be stored somewhere. To not just save data in some file to disk, I will show how to use Apache Solr. It is enterprise search platform used for full-text search, hit highlighting, faceted search, etc.
First, we must create new Solr core (or change existing one if you like).

It has two fields important for our example (title and actors), the rest is mandatory for Solr to run properly.
To use Solr with Camel, new dependency must be added in pom.xml, camel-solr

From here on, there is more than one way to store data to Solr with Camel.

Example 1: Add Bean

For start we will continue from previous post.
First we must make some minimal changes to Movie POJO. In order to store data in this POJO to Solr, some annotations must be added.

Here I annotated fields with SolrJ annotation @Field, but setters (not getters) could be annotated, too.
In ExampleBean we don't need to make changes, at least for this example.
But our route will have to be changed too in order to call Solr. We will make a new route in configure() method in our route builder class.

This route is a copy of the last one from previous post with few changes. First, there is no more processor which was there just for printing out to console. It is replaced with setHeader and URL to Solr core. In header we must set Solr operation to ADD_BEAN since this is what we try to store. And then call Solr that will receive data from out POJO. After splitter, data must be commited and optionally optimized in order to search these in Solr.
This approach is very easy to use but with a lot of data it becomes quite slow. That is because we feed Solr with single document at the time. This can be much faster if we could send more than one document.

Example 2: Insert SolrInputDocument

To achieve that we must first change ExampleBean method giveMeMovies or make new one. I made new one.

Here we see that method does not return list of Movie object, but Solr's own SolrInputDocument. Here you must know field names in Solr schema, but they are the same as in Movie class. Insted of setting data in Movie object with setters, we put it into SolrInputDocument with setField method.
To store SolrInputDocument, route must be changed, too.

Here the change is minimal. Operation was changed from ADD_BEAN to INSERT. This should work nicely, but it still does not fix speed issue. This code still sends single document at the time to Solr. To get around this, splitter must go.

Here you see that splitter was replaced with bean call and operation INSERT was replaced with INSERT_STREAMING. Now Solr will be called only once and it will receive whole list of movies.

Camel-solr component has few more operations, like DELETE_BY_ID and DELETE_BY_QUERY, but it does not have operation for querying and retrieving data. To do that, you can query Solr using HTTP.

My posts on Apache Camel:
 - Part I - Property Placeholder
 - Part II - Using Beans
 - Part III - More on using Beans
 - Part IV - Solr
 - Part V - Marshalling and Unmarshalling