Wednesday, June 8, 2016

AEM Lucene Index was corrupted after unexpected shutdown

We were seeing the following error message appearing in the error.log files after a recent AWS outage unexpectedly shutdown the whole EC2 instance:
org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker Could not access the Lucene index at /oak:index/lucene
java.io.EOFException: reached end of stream after reading 0 bytes; 32468 bytes expected
        at com.google.common.io.ByteStreams.readFully(ByteStreams.java:697)
        at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexFile.loadBlob(OakDirectory.java:218)
        at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexFile.readBytes(OakDirectory.java:262)
        at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.readBytes(OakDirectory.java:348)
        at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory$OakIndexInput.readByte(OakDirectory.java:354)
        ...
AEM did not shut down cleanly when the AWS outage occurred. This log message had pushed the error.log file up to 40+ GB in size.

The first thing I checked was the "Lucene Index statistics" in the JMX Console. On a working Publish instance, this was returning the Lucene Index statistics with an Index Size of 1.2GB. On our suspect instance it was showing no statistics.

To fix this issue, we needed to kick off a Lucene Reindex. To do this you need to login to CRXDE on the instance, navigate to /oak:index/lucene and set the reindex attribute to true.

We were unable login to CRXDE. Our solution was to restart the instance, login as soon as it was available and set the reindex flag. If this solution doesn't work, you can temporarily disable the Oak Lucene bundle (named "Oak Lucene (org.apache.jackrabbit.oak-lucene)"), login to CRXDE and set the reindex flag and finally re-enable the Oak Lucene bundle.