Reputation: 93
I'm trying to load data from a MongoDB BSON file into Pig using com.mongodb.hadoop.pig.BSONLoader (https://github.com/mongodb/mongo-hadoop/blob/master/pig/README.md) but I'm getting stuck. The data on MongoDB includes variable size arrays and I'm not sure how to load that into pig (as a tuple?). Here's a sample record from MongoDB:
{"_id": {"$oid": "52fbbca6e4b029a79cd17ff7"},
"field": "value",
"variableSizeArray": [
"value1",
"value2",
"valueN"
]
}
I've tried the following options and none of them seems to work:
raw = LOAD 'file:///tmp/teststreams.bson' using com.mongodb.hadoop.pig.BSONLoader('','field:chararray,variableSizeArray:()');
raw = LOAD 'file:///tmp/teststreams.bson' using com.mongodb.hadoop.pig.BSONLoader('','field:chararray,variableSizeArray:{T:(h:chararray)}');
Thanks for any help on this.
Upvotes: 0
Views: 481
Reputation: 93
Finally figured it out. The way to do this is by not trying to specify the data type. This works:
raw = LOAD 'file:///tmp/teststreams.bson' using com.mongodb.hadoop.pig.BSONLoader('','field,variableSizeArray');
Upvotes: 2