Computers, bikes and things I’d like to remember.

Xena 5.0.0 released

December 16th, 2009 Posted in Computing, General

After a whole lot of interesting work behind the scenes, the team has pulled together and made the latest and greatest version of our Xena digital preservation software available.

The sourceforge download page offers source code, a Mac dmg installer, a Windows exe installer and packages for Sun’s JRE and OpenJDK for anyone.

I think that the most exciting new feature is our addition of the ability to create a searchable text version of anything that has a text representation. For PDF and DOC that is just an extraction, but for TIFF images of scanned text documents it involves integrating with an Optical Character Recognition engine. We’re using Google’s Tesseract which does a pretty good job of OCR but can be a bit fragile. We managed to find some image content that kills Tesseract with a segfault but it looks like the version in source control is better. Anyway, this is a useful step forward for our software.

In addition we have cut out all of the static jars from external projects that used to live in our source tree, reduced the number of libraries we depend on, added source for those we do depend on and reset our license to be GPL3.

All of this and more besides. Grab a copy, have a play and let us know what you think.

  1. One Response to “Xena 5.0.0 released”

  2. By Brad Hards on Dec 16, 2009

    If you’re working with TIFF files that have TIFF tag 37679, I might be able to give you the text.

Post a Comment