Patricio Page
Patricio Page

Reputation: 11

Problem passing crawler configuration yaml files to stormcrawler

I have storm running locally as single-machine setup. I want to send a topology with an alternative yaml configuration for the crawler. I get an error when the topology cannot load an expected property which is included in the alternative configuration file.

I am trying to send a topology to the storm cluster using this command:

storm jar stormcrawler.jar topology.BasicTopology -conf conf/crawler-conf.yaml

The crawler-conf.yaml contains the following properties:

   config:
       topology.workers: 1
       topology.es.spouts: 1

When I run the script I get this error:

Exception in thread "main" java.lang.NullPointerException                                                                                                                                                          
        at topology.BasicTopology.run(BasicTopology.java:78) 

This is the bit of code in the BasicTopology class:

    @Override
    protected int run(String[] args) {
        log.info(this.conf.keySet().toString());
        int nbWorkers = (int) this.conf.get("topology.workers");   <--- NPE

As far as I've been able to investigate, the problem is the storm.py script will interpret the "-conf" as a "common config" (it looks for the -c flag) and set it as a Storm option. So it will interpret that we are trying to set "onf" as a storm option, so it runs storm with the

-Dstorm.options=onf

After picking up "onf" as storm option, what is being set to the topology as args is just "conf/crawler-conf.yaml". Since this arg is not preceded by "-conf", the yaml file is not parsed for its properties.

This didn't happen in 1.2.2 but is happening now in 2.3.0 (argparser was added to the storm.py script)

Upvotes: 0

Views: 63

Answers (1)

Julien Nioche
Julien Nioche

Reputation: 4864

Try

storm local target/stormcrawler.jar --local-ttl 3600 topology.BasicTopology -- -conf conf/crawler-conf.yaml

See https://github.com/DigitalPebble/storm-crawler/tree/master/archetype/src/main/resources/archetype-resources for an example of what is generated from the archetype.

If you are porting an existing SC 1.x to 2.x, it might be a good idea to have a clean start with the archetype and add the bits that are specific to your application. Quite a few things have changed since 1.x and it would be a good way of making sure that you haven't forgotten anything.

I would also recommend that you consider using Flux files as they are more flexible than a hard-coded topology. Most users, including myself, use them.

Upvotes: 0

Related Questions