Tuesday, September 8, 2009

What is Hibernate Search?


I'm getting this question relatively often, so I think that existing information online is making too many assumptions, or is too practical; I'll try to fill this gap with a very basic introduction.
Hibernate Search is an open source Java project which integrates Hibernate with Lucene; both libraries have proven themselves extremely useful and are stable and of widespread use; in practice many projects face the need to use both.
Unfortunately the String world of Lucene is quite different than the Hibernate world and every project trying to integrate both is doomed to face the same problems, to rewrite more or less the same glue, to have more code to maintain because of their own bugs, or because of API changes in one or both frameworks.

Lucene
Is an Apache library which provides full-text capabilities: you create an index (in memory, on filesystem, in a database,...) and then you can search this index on keywords, phrases, boolean queries, etc..
The results are commonly returned by relevance, so the best matching documents are returned first (think of it as a web search engine like google); the main point is that you have full control about how your items are parsed before entering the string world of the index, to choose which information is important for your business, how you define the matching rules. It is very fast and generally considered stable, still new features are constantly added.
Being extremely flexible, working directly with Lucene is like programming "low level" so often applications introduce a separation layer to standardize the way it is used across an application, thus hiding some of the flexibility and possibly introducing some helpers.

Hibernate
the aim of this very successful open source project is to simplify the interaction between the application and the database; technically it's and Object-Relational Mapping service; you'll find plenty of information and tutorials about it on the web. The important point to introduce Hibernate Search is that it makes you use POJOs to define the domain model of your application, annotating them to define the mapping to the database, and provides good APIs and even an object-oriented query language to interact with the database, all nicely fitted in a transactional world.

Hibernate Search
Hibernate Search is built on top of Lucene, like Hibernate is built on top of your SQL database. As Hibernate maps POJOs to tables, Hibernate Search maps them with to Lucene's index introducing a new set of annotations. The interesting point here is that you annotate with both families of annotations the same entities, and when you make an Hibernate query to the database or a Lucene query to the index, you'll get Hibernate managed entities in both cases. You define your domain model - which is unique - and how it maps to the database and to the index. When you make changes to your data the service will update both database and index at transaction commit.
The API to run and paginate queries is an extension of Hibernate's (and JPA) API, so the changes in an application to introduce full-text capabilities are minimal.
When using Lucene the code ususally gets quite verbose, like when defining Analyzers or Filters; with Hibernate Search you can define these declaratively and reuse them by name. Last but not least it makes use of several performance improving tricks, like: sharing file buffers across concurrent reading sessions, caching filter results, batching index changes, clustering solutions. All nice capabilities which you don't need to know, but they are there in case you'll need them.

Flexibility
Even being a simplifying layer between the application and Lucene, it won't hide any advanced feature but provide tools to make use of them. Developers can customize all aspects: from defining custom bridges for your types up to replacing/extending whole parts of the framework. Each mayor component can be replaced with custom code: define your own index storage strategy by creating a custom DirectoryProvider, use your own LockManager, create a new IndexShardingStrategy, fine-tune all performance settings which Lucene exposes. If you're still missing something, you're free to change the code and submit patches.

Websites:
Hibernate Search - website
Hibernate Search - forums
Lucene's Java implementation website

Books:
Hibernate Search in Action
Java Persistence with Hibernate
Lucene in Action, Second Edition

4 comments:

Ramachandran Gopalan said...
This comment has been removed by a blog administrator.
Ramachandran Gopalan said...
This comment has been removed by a blog administrator.
Bharath Narasiman said...
This comment has been removed by a blog administrator.
Bharath Narasiman said...
This comment has been removed by a blog administrator.