Reputation: 1437
I'm running PySpark with Elasticsearch back using the Elasticsearch-hadoop connector. I can read from a desired index using:
es_read_conf = {
"es.nodes": "127.0.0.1",
"es.port": "9200",
"es.resource": "myIndex_*/myType"
}
conf = SparkConf().setAppName("devproj")
sc = SparkContext(conf=conf)
es_rdd = sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf=es_read_conf
)
Works fine. I can wildcard the index.
How do I wildcard the document "type"? Or, how could I get more than one type, or even _all
?
Upvotes: 0
Views: 1065
Reputation: 52368
For all types you can use "es.resource": "myIndex_*".
For the wildcard part you would need a query:
"prefix": {
"_type": {
"value": "test"
}
}
Upvotes: 2