Merch
Merch

Reputation: 63

Pig Latin counting number of keys in map for every tuple

I'm trying to get a count of all users in an alias. Each row contains a map of users.

Like this: ([user_name/454543#Paul Kison]) ([user_name/43433#Josiel's iPhone,user_name/34343434#Jose's iPAD,user_name/3434645655#Josiel's])

When using size() on the entire alias I get this error: ERROR 1066: Unable to open iterator for alias user_count. Backend error : Scalar has more than one row in the output.

users = LOAD 'hbase://group'
   USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('n:user_display_name*', '-limit 10')
   as(display_name);

user_count = FOREACH users GENERATE SIZE(users.display_name);

The idea was to sum the output of the count of each map to get the total count.

Upvotes: 0

Views: 368

Answers (1)

Merch
Merch

Reputation: 63

I had to explicitly set the type of the display_name column to map[] and change use just the column name as the expression passed to SIZE().

users = LOAD 'hbase://group'
   USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('n:user_display_name*', '-limit 10')
   as(display_name:MAP[]);

user_count = FOREACH users GENERATE SIZE(display_name);

After that I summed the result like this:

users_group = GROUP user_count ALL;
total = FOREACH users_group GENERATE SUM(user_count);

Upvotes: 1

Related Questions