Dan
Dan

Reputation: 93

How do you load an array from a BSON file on Pig using mongo-hadoop?

I'm trying to load data from a MongoDB BSON file into Pig using com.mongodb.hadoop.pig.BSONLoader (https://github.com/mongodb/mongo-hadoop/blob/master/pig/README.md) but I'm getting stuck. The data on MongoDB includes variable size arrays and I'm not sure how to load that into pig (as a tuple?). Here's a sample record from MongoDB:

{"_id": {"$oid": "52fbbca6e4b029a79cd17ff7"},
 "field": "value",
 "variableSizeArray": [
    "value1",
    "value2",
    "valueN"
 ]
}

I've tried the following options and none of them seems to work:

raw = LOAD 'file:///tmp/teststreams.bson' using com.mongodb.hadoop.pig.BSONLoader('','field:chararray,variableSizeArray:()');
raw = LOAD 'file:///tmp/teststreams.bson' using com.mongodb.hadoop.pig.BSONLoader('','field:chararray,variableSizeArray:{T:(h:chararray)}');

Thanks for any help on this.

Upvotes: 0

Views: 481

Answers (1)

Dan
Dan

Reputation: 93

Finally figured it out. The way to do this is by not trying to specify the data type. This works:

raw = LOAD 'file:///tmp/teststreams.bson' using com.mongodb.hadoop.pig.BSONLoader('','field,variableSizeArray');

Upvotes: 2

Related Questions