Random Colors

Crude Java and Agile thoughts

Maven Lucene Plugin 1.0 : Documentation and Usage


Maven Lucene Plugin is an open source maven plugin for Apache Lucene developed by Xebia India IT Architects Ltd . The project is hosted on SourceForge and can be found here. It is released under GNU General Public License (GPL).

The artifact is uploaded at Central Maven Repository http://repo1.maven.org/maven2/com/xebia/. You can browse the artifact here.

maven-lucene-plugin creates a Lucene index from a file source. The structure of the lucene index i.e. fields, analyzers, indexLocation, fileSourceLocation, store etc can be configured in a configuration file lucene.xml.

lucene.xml contains all information regarding the lucene index and the data source (from which the index is created). The Maven Lucene Plugin looks for lucene.xml file in the project root directory (adjacent to pom.xml) and creates the lucene index from the file source mentioned in lucene.xml based on the index configuration provided.

Apache Lucene Version

The maven-lucene-plugin uses Lucene version 3.0.3. The index can be opened using Luke 1.0.1. Luke can be downloaded from here.

Configuring plugin in pom.xml

Configuring the maven-lucene-plugin in pom.xml is very simple. All information about index configuration needs to be written in lucene.xml. Index can be created by running “createIndex” goal in the plugin. It can be done by running the goal during any phase as shown in the example plugin configuration. OR it can be called by executing “createIndex” goal from command line by “mvn lucene:createIndex”.

The “createIndex” goal looks for a file “lucene.xml” in the project root directory (adjacent to pom.xml). The lucene.xml file needs to have all the information related to the configuration of index.

Example pom.xml is shown below:

<build>
 <plugins>
  <plugin>
    <groupId>com.xebia</groupId>
    <artifactId>maven-lucene-plugin</artifactId>
    <version>1.0</version>
    <executions>
     <execution>
      <phase>install</phase>
      <goals>
       <goal>createIndex</goal>
      </goals>
     </execution>
    </executions>
  </plugin>
 </plugins>
</build>

Example lucene.xml

maven-lucene-plugin looks for a file lucene.xml in the project root directory (adjacent to pom.xml). The maven-lucene-plugin will not work without lucene.xml. lucene.xml contains all the information about the lucene index.

Example lucene.xml is provided below and each tag is explained in detail after that:

<index>
  <indexLocation>C:\\index
  <sourceFileLocation>C:\\file\\data.txt
  <overWrite>true</overWrite>
  <separator>;</separator>
  <field>
    <name>names</name>
    <type>string</type>
    <analyzer>WhiteSpaceAnalyzer</analyzer>
    <store>true</store>
  </field>
  <field>
    <name>hobby</name>
    <type>string</type>
    <analyzer>SimpleAnalyzer</analyzer>
    <store>true</store>
  </field>
</index>

The parent tag is index. All other information is encapsulated inside it.

  1. <indexLocation>C:\\index</indexLocation>

    indexLocation tag points to the directory on your disk where the index will be created after the goal:createIndex is executed.

  2. <sourceFileLocation>C:\\file\\data.txt</sourceFileLocation>

    sourceFileLocation tag points to the file on your disk which will be indexed.

  3. <overWrite>true</overWrite>

    overWrite tag specifies whether the existing index present at the indexLocation (if any) should be deleted or should be overWritten

  4. <separator>;</separator>

    separator tag specifies the delimiter between the field names and the data to be indexed in it. The default separator is ; (Please refer to Sample File Source Format described in the next section for more detailed information).

  5. <field>
     <name>names</name>
     <type>string</type>
     <analyzer>WhiteSpaceAnalyzer</analyzer>
     <store>true</store>
    </field>

A specifies a Lucene Field. can have one or many .

A <field> should contain the following tags :

Following analyzers are supported by maven-lucene-plugin in release 1.0. The default is the WhitespaceAnalyzer.

  • “WhitespaceAnalyzer”
  • “SimpleAnalyzer”
  • “KeyWordAnalyzer”
  • “StopAnalyzer”
  • “StandardAnalyzer”

6. <store> specifies whether this field will be stored in the index or not.

Source File Format

This is an example of file source ( file containing data to be indexed). The data in the file source should be written in the format explained below.

  • The top row represents the field names separated by the separator.
  • The separator is the defined in lucene.xml.
  • The default separator is “;”.
  • The rows other than the top row contains values for the field.
  • The field name should be same as that specified in the <field><name>fieldName</name></field> tag in lucene.xml.

The source file can be seen as a set of columns with the element at the top as the field name. Technically, each row other than the top row (name of fields) is considered as a lucene document.

For eg. in the example file data shown below, names and hobby are the field names.

“names” field has values “albert einstein” and “Leonardo da Vinci”.
“hobby” field has values “invention and discovery” and “Writing Painting”

Example File

names;hobby
albert einstein;invention and discovery
Leonardo da Vinci;Writing Painting

Maven Lucene Search

maven-lucene-search is a dependency which can be used to search lucene indexes without doing any Lucene specific coding.

It can be added by using dependency

<dependency>
 <groupId>com.xebia</groupId>
 <artifactId>maven-lucene-search</artifactId>
 <version>1.0</version>
</dependency>

It requires a lucene.xml file in src/main/resources. This lucene.xml file structure is same as described earlier for maven-lucene-plugin.

Including maven-lucene-search as a dependency provides a LuceneSearcher interface which has an implementation LuceneSearcherImpl. LuceneSearcher provides useful and simple methods to work on the Lucene Index created via the maven-lucene-plugin. It provides method to search on different fields and to delete different documents. The overhead of opening index reader and searcher and lucene specific stuff is taken care by the dependency.

The maven-lucene-search uses Lucene version 3.0.3.

Example Project

A example project which uses maven-lucene-plugin for indexing and maven-lucene-search dependency for searching can be found here.

 

This project builds a Lucene index from a file C:\\file\\data.txt. So, either put data.txt in C:\\file\\ with the content provided below or change the file location in lucene.xml in project root to the correct file. The index is created when mvn lucene:createIndex or mvn clean install is executed.

The data.txt should have the following data:

names;hobby
albert einstein;invention and discovery
Leonardo da Vinci;Writing Painting

The project also contains a file SearchExample.java which demonstrates the use of LuceneSearcher availabe in maven-lucene-search by searching the index created by the plugin.

Development and Contribution

The Maven Lucene Plugin is created by Xebia India IT Architects Ltd. It is an open source project hosted at Sourceforge. The project url is http://sourceforge.net/projects/lucene-plugin/.

The source code of this project can be downloaded from SVN repository at SourceForge here. The SVN contains these projects

1) maven-lucene-plugin – Maven plugin to create index.
2) maven-lucene-search – Maven dependency to easily search on the index created by maven-lucene-plugin.

3) client – Example project which uses maven-lucene-plugin to create index and maven-lucene-search to search on the provided index.

New feature requests can be logged here and the bugs can be logged here. Support requests can be logged here.

If you are interested in contributing to this project, then please contact pranjan@xebia.com.

Future Releases

In future releases, following features will be included. If you want any specific feature, please log it here.

  • The maven-lucene-plugin should be able to create index from different datasources ( filesource, database, web, xml ).
  • Create annotations which can be used on fields of classes to index data.
  • Populate db from a lucene index using indexToDataBase goal.
  • Integrate luke in the maven-lucene-plugin. A maven goal will be created which opens luke initialized with the index locatioin provided in lucene.xml.
  • Provide easy to use utility methods for querying and updating the lucene index through a interface in maven-lucene-search.
  • Package the index as an artifact.
Advertisements

6 responses to “Maven Lucene Plugin 1.0 : Documentation and Usage

  1. Pingback: Maven Lucene Plugin 1.0 : Documentation and Usage

  2. Stephan March 6, 2011 at 5:35 pm

    I was wondering if we couldn’t use this plugin to make sonar and jenkins more “search friendly” locating a class in all project analysis / workspaces

    • paritoshranjan March 6, 2011 at 7:32 pm

      Hi Stephan,

      Looks like a good case. The plugin can be used to create indexes at runtime and then search on it.

      I am the developer of maven lucene plugin. I work with couple of guys on this plugin.

      If you want some specific support or feature to make it work in sonar and jenkins, I will be more than happy to develop it for you.

      Cheers,
      Paritosh

  3. Bruce Phillips March 6, 2011 at 6:12 pm

    On Mac OS 10.6, I created the data.txt file in my root folder and changed the lucene.xml sourceFileLocation node to /data.txt.

    When executing mvn -X lucene:createIndex I get

    [INFO] [lucene:createIndex {execution: default-cli}]
    ********Source File Not Found************
    [INFO] ————————————————————————
    [ERROR] FATAL ERROR

    • paritoshranjan March 6, 2011 at 7:27 pm

      I am not aware of Mac OS’s path syntax. However, I can tell you one thing which will work for sure.

      In sourceFileLocation, write data.txt (without /), and keep the data.txt in project root directory (adjacent to pom.xml). I just tested it. It works.

      A Hint for the Path problem, The sourceFileLocation takes the fully qualified path with file name.

  4. Pingback: Maven Lucene Plugin 1.0 : Documentation and Usage

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: