Reputation: 359
I'm quite a begginer on PigLatin and I need some (basic, I think) help.
My data is DESCRIBE as :
xmlToTuple: {(node_attr_id: int,tag: {(tag_attr_k: chararray,tag_attr_v: chararray)})}
and DUMP like this :
((704398904,{(lat,-13.00583333),(lon,45.24166667)}))
((1230941976,{(place,village)}))
((1230941977,{(name,Mtsahara)}))
((1751057677,{(amenity,fast_food),(name,Brochetterie)}))
((100948360,{(amenity,ferry_terminal)}))
((362795028,{(amenity,fuel),(operator,Total)}))
I want to extract the record which have a certain value for the tag_attr_k field. For example, give me the record where there is a tag_attr_k = amesity ? That should be :
((1751057677,{(amenity,fast_food),(name,Brochetterie)}))
((100948360,{(amenity,ferry_terminal)}))
((362795028,{(amenity,fuel),(operator,Total)}))
Anybody can explain me to do that ? I'm a bit lost…
Upvotes: 3
Views: 4003
Reputation: 359
I Found!
XmlTag = FOREACH xmlToTuple GENERATE FLATTEN ($0);
XmlTag2 = FOREACH XmlTag {
tag_with_amenity = FILTER tag BY (tag_attr_k == 'amenity');
GENERATE *, COUNT(tag_with_amenity) AS count;
};
XmlTag3 = FOREACH (FILTER XmlTag2 BY count > 0) GENERATE node_attr_id, node_attr_lon, node_attr_lat, tag;
Upvotes: 2
Reputation: 39893
You should be using a map
for this, not a list of tuples. Maps are built for exactly this purpose. http://pig.apache.org/docs/r0.10.0/basic.html#data-types
To filter out, you'll do:
B = FILTER A BY mymap#'amenity' IS NOT NULL;
Upvotes: 1
Reputation: 5801
You should use a map instead of a bag of tuples. The keys will be your tag_attr_k
s, and your values your tag_attr_v
s. So one line of your data would be, e.g.,
(1751057677,['amenity'#'fast_food', 'name',#'Brochetterie'])
Then you can check to see if a key exists by attempting to access it and checking whether the value is NULL
.
FILTER xml BY tag_attr#'amenity' IS NOT NULL;
Upvotes: 3