Reputation: 11691
I am very new to pig latin so excuse any ignorance in the following question. I have inherited some code that does essentially the following:
USERS = LOAD 'hbase://some_table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('s:*', '-caster HBaseBinaryConverter --limit some_limit') AS (user_map:map[chararray]);
Now if I do a dump of USERS
I get something like the following (fake data)
([n1#{"s":{"added": 1430668638000, "lastseen": 1430668638000, "expires": 1433260638000}},n2#{"s":{"added": 1430668638000, "lastseen": 1430668638000, "expires": 1433692638000}},n22#{"segment":{"added": 1430668638000, "lastseen": 1430668638000, "expires": 1433260638000}},n3#{"s":{"added": 1430668638000, "lastseen": 1430668638000, "expires": 1433692638000}},n4#{"segment":{"added": 1430668638000, "lastseen": 1430668638000, "expires": 1433692638000}}])
([n8#{"s":{"added": 1428792426000, "lastseen": 1428792426000, "expires": 1431816426000}},n9#{"segment":{"added": 1428792426000, "lastseen": 1428792426000, "expires": 1431816426000}},n11#{"segment":{"added": 1428792426000, "lastseen": 1428792426000, "expires": 1431816426000}}])
Essentially I want to get at the n*
values in the output. But I am not exactly sure how to break them down from this schema. Any help would be greatly appreciated.
To explain my question a bit more, perhaps my understanding of the map:[chararray]
schema (and how to manipulate it) is lacking
EDIT My desired expected output would be storing all of n*
information into a variable called TITLES
. This way when I do DUMP TITLES
I would get the following
n1#
n2# ...
Upvotes: 1
Views: 86
Reputation: 11691
Was able to answer my own question by writing a python UDF. In Pig the call looks like this
N_S = FOREACH USERS GENERATE my_udfs.translate_map(user_map)
My python udf looks something like
@outputSchema("doc:chararray")
def translate_map(input):
n_str = ""
for k, v in input.items():
n_str += str(key)
n_str += " "
return n_str
Upvotes: 1