PJ.
PJ.

Reputation: 1206

Error during node startup: Unable to start DSE server / Plugin activation failed / Cannot find core

I've been having these issues for quite a while already but I ignored them initially because I can still start my nodes. However, one of these issues became more serious recently that it now takes me a lot of tries in order to successfully start a node.

Issue #1: Unable to start DSE server / Plugin activation failed / Cannot find core

ERROR [main] 2015-01-28 03:30:40,058 DseDaemon.java (line 492) Unable to start DSE server.
java.lang.RuntimeException: com.datastax.bdp.plugin.PluginManager$PluginActivationException: Plugin activation failed
        at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:135)
        at com.datastax.bdp.server.DseDaemon.start(DseDaemon.java:480)
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:509)
        at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:659)
Caused by: com.datastax.bdp.plugin.PluginManager$PluginActivationException: Plugin activation failed
        at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:284)
        at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:128)
        ... 3 more
Caused by: java.lang.IllegalStateException: Cannot find core: myks.mycf
        at com.datastax.bdp.search.solr.core.SolrCoreResourceManager.doWaitForCore(SolrCoreResourceManager.java:742)
        at com.datastax.bdp.search.solr.core.SolrCoreResourceManager.waitForCore(SolrCoreResourceManager.java:478)
        at com.datastax.bdp.plugin.SolrContainerPlugin.waitForSecondaryIndexesLoading(SolrContainerPlugin.java:237)
        at com.datastax.bdp.plugin.SolrContainerPlugin.onActivate(SolrContainerPlugin.java:98)
        at com.datastax.bdp.plugin.PluginManager.initialize(PluginManager.java:334)
        at com.datastax.bdp.plugin.PluginManager.activate(PluginManager.java:263)
        ... 4 more
 INFO [Thread-3] 2015-01-28 03:30:40,059 DseDaemon.java (line 505) DSE shutting down...
 INFO [StorageServiceShutdownHook] 2015-01-28 03:30:40,164 Gossiper.java (line 1307) Announcing shutdown
 INFO [Thread-3] 2015-01-28 03:30:40,620 PluginManager.java (line 356) All plugins are stopped.
 INFO [Thread-3] 2015-01-28 03:30:40,620 CassandraDaemon.java (line 463) Cassandra shutting down...
 INFO [StorageServiceShutdownHook] 2015-01-28 03:30:42,165 MessagingService.java (line 701) Waiting for messaging service to quiesce
 INFO [ACCEPT-/144.76.201.233] 2015-01-28 03:30:42,814 MessagingService.java (line 941) MessagingService has terminated the accept() thread

This exception started as a "mild" issue - mild because although it prevents a node from starting up when it happens, it usually takes me 1 more try to successfully start the affected node. However, about two weeks ago, after having not restarted any of my nodes for quite a while, I discovered that I now need a lot more attempts (20+) in order to start a node.

From the stack trace, it looks like a timeout issue (in doWaitForCore()); but I cannot find a setting to increase the amount of time that DSE would wait for a core to load during startup before giving up. The core that is mentioned in the stack trace is always the same, and I assume that this is because it is my biggest core (~1.4 billions records) and it takes the longest time to load. But when I manage to start the node successfully, there are no signs of errors - I can query the core like any other core.

--

There are two other issues that may or may not be related to the one above. Both of them always appear during startup; and unlike the first one, they do not cause a startup failure (i.e. they also appear when a node starts successfully)

Issue #2: Invalid Number: static

ERROR [searcherExecutor-67-thread-1] 2015-01-28 04:26:49,691 SolrException.java (line 124) org.apache.solr.common.SolrException: Invalid Number: static
        at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:396)
        at org.apache.solr.schema.FieldType.getFieldQuery(FieldType.java:697)
        at org.apache.solr.schema.TrieField.getFieldQuery(TrieField.java:343)
        at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:741)
        at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:545)
        at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
        at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
        at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
        at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
        at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:153)
        at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
        at org.apache.solr.search.QParser.getQuery(QParser.java:143)
        at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:135)
        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:183)

I looked at the data that I imported and I couldn't find a supposedly-numeric value that was incorrectly supplied as "static". In the java application that I wrote to convert CSVs to SSTables, I cast all numeric values to int/long/double depending on the field type so I honestly don't think that it has something to do with my data.

Issue #3: Could not getStatistics on info bean com.datastax.bdp.search.solr.FilterCacheMBean

WARN [SolrSecondaryIndex myks.mycf2 index initializer.] 2015-01-28 04:26:51,770 JmxMonitoredMap.java (line 256) Could not getStatistics on info bean com.datastax.bdp.search.solr.FilterCacheMBean
java.lang.RuntimeException: java.lang.ClassCastException: org.apache.lucene.search.FieldCache$CreationPlaceholder cannot be cast to org.apache.solr.search.SolrCache
        at com.datastax.bdp.search.solr.FilterCacheMBean.getStatistics(FilterCacheMBean.java:185)
        at org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:236)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
        at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:140)
        at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51)
        at com.datastax.bdp.search.solr.core.CassandraCoreContainer.registerExtraMBeans(CassandraCoreContainer.java:679)
        at com.datastax.bdp.search.solr.core.CassandraCoreContainer.register(CassandraCoreContainer.java:427)
        at com.datastax.bdp.search.solr.core.CassandraCoreContainer.doLoad(CassandraCoreContainer.java:757)
        at com.datastax.bdp.search.solr.core.CassandraCoreContainer.load(CassandraCoreContainer.java:162)
        at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex$2.run(AbstractSolrSecondaryIndex.java:882)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: org.apache.lucene.search.FieldCache$CreationPlaceholder cannot be cast to org.apache.solr.search.SolrCache
        at com.datastax.bdp.search.solr.FilterCacheMBean.getStatistics(FilterCacheMBean.java:174)
        ... 16 more

I have absolutely no idea what this is.

--

Has anyone encountered these errors/exceptions/warnings before? What did you do?

Upvotes: 0

Views: 1877

Answers (2)

sbtourist
sbtourist

Reputation: 726

Regarding issue #2, are you sure you don't have a QuerySenderListener with a wrong warmup query in your solrconfig?

Upvotes: 1

eribeiro
eribeiro

Reputation: 592

Issue #1: The max waiting time to load a core was hard-coded at 1 min. So, your assumption is right: a very large core or hundreds of cores could prevent the node starting due to the excessive time to load this particular core. In the next patch release (4.5.6, 4.6.1) we address this issue by creating a new option load_max_time_per_core in dse.yaml. This option allows you to increase the max waiting time for core loading, starting at 1 min. For 500 cores you would need to increase load_max_time_per_core to about 3 minutes, for example.

Issue #2: Unfortunately, I don't know what could be causing this. We would need further info about this to see why it's happening.

Issue #3: We have currently investigating what this can be.

Upvotes: 2

Related Questions