Sunday, September 28, 2008

Ideas on Hibernate Search

I'm reading the preview of the upcoming excellent Hibernate Search in Action and I am getting inspiration about some improvements I would like to implement, in particular:

Multiple backends
As each index has it's own DirectoryProvider why not also it's own backend?
  • needing an entity to be indexed "sync" while others "async".
  • wanting an index using JMS, another async / local.
  • using different JMS queues without a selector.
The implementation is not hard at all, I'll have to move some classes from the refactored org.hibernate.search.backend.impl.lucene (I'm working on it already) to org.hibernate.search.backend; probably most work will be about discussing which is the best and simplest way for users to configure them.

A scalability improving ReaderProvider
The ReaderProviderS have all to guarantee the returned indexreader is absolutely updated; however this doesn't make much sense in "async" mode, as the user would probably prefer a slightly outdated reader traded for some extra throughput.
In my opinion most users sill use the "async" mode for most entities, in particular if we enable different backends as in my experience there usually are a feq entities which need "sync" mode.
I was thinking about implementing a new ReaderProvider which could "wrap" another implementation (for flexibility) and then periodically retrieve a new IndexReader from the wrapped one in a configurable time period.
So two initialization arguments: backing implementation (class or name), frequency period (ms).
This way if the wrapped ReaderProvider is a plain NotSharedReaderProvider the index would be reopened each other X ms:
  • drawback: potentially opening more than needed under low load.
  • advantage: the actual rate is controlled even under high load.
Additionally if it is wrapping a smarter implementation like shared or shared-segments the drawback will degenerate in just some useless new file checks, instead of really reading all data.

I am actually sorry I didn't have this idea earlier, as I think the implementation is trivial but needs good docs and explanation... too late for inclusion in the book?

Automatic Sharding strategies
As Emmanuel explains in the book the IdHashShardingStrategy strategy provided with H.Search is more like a "demo" strategy as the most interesting strategies depend on the user needs.
It occurs me the tips he is giving could apply very well on an entity having an Enum property, in which case a great IndexShardingStrategy could be generated automatically and "autoconfigured" as we know already the number of elements and are guaranteed they have different names.. good for index names postfixes or something like that.
Just add an annotation to the field, something like
@ShardDiscriminator.

Improved Filters exploiting improved Sharding...
Having created such an optimal IndexShardingStrategy will further enable the code to create a special Filter to be used during searches, which is capable of improving Search performance by selecting the correct Index to search in (avoiding the search in all indexes).
When adding a new value the SearchFactory must be restarted anyway, so the fact you can't add a new Enum value dynamically should not be a limit: as you should not anyway, I actually like the fact that you can't. Looks like java compiler enforces correct Hibernate Search configuration and usage...

The same logic could be applied to any String field or @ManyToOne pointing to an entity
whose number will not change and has a way to be transformed in a unique index indentifier (require a toString or have it implement an interface?); We could start supporting the Enum and see how good it is.

Monday, July 23, 2007

Updating to JBoss 4.2.1

I'm going to update an enterprise application from using JBoss 4.0.5 to JBoss 4.2 and writing here the memorandum of most important steps and considerations, as asked by friends and my boss.
Why updating?
  • Version 4.2 fixes some bugs which were affecting our application.
  • Red Hat announced they are going to use 4.2 in their Enterprise Application Platform, so they are committed to support this version for a long time.
  • I had already updated some libraries of JBoss 4.05 to use some features we where needing; the new version comes with the same updated libraries so I think they will be better compatible than my souped-up, unsupported version.
Technologies
The application to migrate is a JavaEE web application, developed in Eclipse and using these technologies:
  • Seam 1.2.1 is the integrating framework.
  • Faceletes for page design
  • Hibernate 3.2 : core, search, annotations and entitymanager.
  • Lucene 2.2 for fast full-text searching: both using hibernate search and custom code.
  • JTDS as JDBC driver.
  • SQL Server 2000 as Database on a windows server
  • Fedora linux 6 for the webserver.
  • "some" JSF implementation...
We were using the myfaces JSF implementation, but the JBoss and Seam people now recommend using the Sun reference implementation, and this comes bundled as default in the new application server. You have two options:
  1. keep the myfaces JSF implementation.
  2. update to Sun's RI JSF implementation.
We had no particular need to keep myfaces, and being the second option recommended by the JBoss team, we are actually going to see how painless it is to switch implementations.

So now we can begin our
Migration checklist to JBoss 4.2.1 and Sun's JSF
from Jboss 4.0.5 and myfaces.
In application.xml remove the following modules from application.xml:
<module>
<java>el-api.jar</java>
</module>
<module>
<java>el-ri.jar</java>
</module>
and add this one instead:
<module>
<java>commons-collections-3.1.jar</java>
</module>
You will need to remove and add the relative jars to the root of your ear. The commons-collections-3.1 is needed by Ajax4jsf; Jboss ships now with a different version.

In faces-config.xml add

<application>
<el-resolver>org.jboss.seam.jsf.SeamELResolver</el-resolver>
<message-bundle>messages</message-bundle>
</application>
and update the headers to:
<faces-config version="1.2"
xmlns="http://java.sun.com/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-facesconfig_1_2.xsd">

In web.xml remove this listener:

org.apache.myfaces.webapp.StartupServletContextListener
and all other references to myface's classes. No new listener should be needed.
When using tomahawk you may like to keep some org.apache.myfaces context parameters, as it should work on Sun's RI but uses the parameters named in myface's style.

You may like to update the header of jboss-app.xml:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE jboss-app
PUBLIC "-//JBoss//DTD J2EE Application 4.2//EN"
"http://www.jboss.org/j2ee/dtd/jboss-app_4_2.dtd">
Problems & some fixes
Content-Type
Using myfaces the rendered pages have "Content-Type: text/html;" as reported by Firefox's plugin "web developer", using Sun's RI the content type is now "Content-Type: application/xhtml+xml;".
Technically the second one should be better, as discussed by here, but it brings some issues:
  1. Even with same HTML and CSS the pages could look-like different.
  2. Internet explorer (upto 6) doesn't like "xhtml+xml"
  3. Some redirects won't work.
The good news is that you will get firefox to do a full check of your pages, so maybe you'll find some errors faster. To get the content as text/html you define the contentType attribute in your views:
<f:view contentType="text/html"...
Still this leaves a problem with Seam's PDF rendering: there's no way to modify the content type of the redirect servlet that brings you from the page to the pdf download link, I'm going to see if I can get apache to force the "correct" contentType served to clients.