FIDIL
FIDIL

Reputation: 117

"substr" statement in apache pig

I have below user data structure in apache hadoop

21796346,83637,2990666,1,2,false,0,0
21827841,15748,8754621,1,7,true,0,1

First 4 digits of the 1st field represent the user type. 2nd field represents the department type.

I would like to query the number of user types in each department. SQL statement is below

select dept_id, substr(User_Id,1,4) as user_type, count(*) as number_of_users from users group by dept_id,substr(User_Id,1,4)

I could not figure out how to define substr function in pig.

Upvotes: 3

Views: 6398

Answers (2)

user3387616
user3387616

Reputation: 81

You could youse SUBSTRING in PIG

A = LOAD 'DATA' USING PigStorage(';') AS (User_Id, var1, var2, var3, var4, var5, var6, var7); 
B = GROUP A By SUBSTRING(User_Id,1,4);
C = FOREACH B GENERATE group as user_typeX, COUNT(A) as number_of_users_with_the_same_user_typeX;

To get the number of all users you could GROUP BY ALL.

Upvotes: 3

reo katoa
reo katoa

Reputation: 5811

You can find the complete list of Pig's built-in functions here. The function you are looking for is called SUBSTRING. Note that function names in Pig are case-sensitive.

Upvotes: 2

Related Questions