user2631600
user2631600

Reputation: 759

Issue with Complex data types in pig

I am new to pig programming, i worked on simple data types in pig more,when i try to study complex data types , i am not getting proper examples, with input and output for complex data types ,can any one explain me complex data types ,specially Map datatype in detail with real time examples ,Thanks in Advance.

Upvotes: 1

Views: 2872

Answers (1)

Kishore
Kishore

Reputation: 5891

Pig has three complex types: maps, tuples and bags. These complex types can contain scalar types and other complex types. So, it is possible to have a map where value field is a bag which has a tuple where one of the fields is a map.

Map: A map is a chararray to data element mapping which is expressed in key-value pairs. The key should always be of type chararray and can be used as index to access the associated value. It is not necessary that all the values in a map be of the same type.

Map constants are defined by square brackets with '#' separating keys from values and ',' separating key-value pairs.

 ['Name'#'John', 'Age'#22]  

The above defines a map constant with two key-value pairs. Notice that the keys are always of type chararray while values take type chararray and int respectively.

In order to load data from files as maps, the data should be structured as below:

 [this#1.9, is#2.5]  
 [my#3.3, vocabulary#4.1] 

Sample PigLatin statements to load the above data sample as map

 grunt> mapdata = load 'MapData' as (a:map[float]);  
 grunt> values = foreach mapdata generate a#'this' as value;  
 grunt> value = FILTER values BY value is not null;  
 grunt> dump value 

The output of above statements is:

 (1.9)

The load statement will construct two maps having two key-value pairs each. Notice that we specify the data type of values as 'float' in load statement. We can choose not to specify the type of values as below:

 grunt> mapdata = load 'MapData' as (a:map[]); 

In this case Pig assumes the type of values to be bytearray and performs implicit casts to appropriate type depending on how your PigLatin statements handle the data.

In the second statement we are trying to retrieve the value associated with 'this'. Notice the syntax

a#'this'

which will return 1.9.

Tuple: Tuples are fixed length, ordered collection of Pig data elements. Tuples contain fields which may be of different Pig types. A tuple is analogous to a row in Sql with fields as columns.

Since tuples are ordered it is possible to reference a field by it's position in the tuple. A tuple can, but is not required to declare a schema which describes each field's data type and provides a name for the field.

Tuple constants use parentheses to define tuple and commas to separate different fields.

 ('John', 25) 

The above declares a tuple constant with two fields of data types, chararray and int respectively.

 grunt> data = load 'StudentData';  
 grunt> finaldata = foreach data generate $0;  
 grunt> dump finaldata  

In above statements, data is an outer bag (explanation of bags is coming next) which contains tuples loaded from StudentData file. Notice that we did not declare a schema for the tuples (type/name of the fields contained in the tuple). In this case, schema for the tuple is unknown.

However, we can reference individual fields in the tuple by their position ($0 references to the first field in the tuple).

grunt> data = load 'StudentData' as (name:chararray, age:int);  
grunt> finaldata = foreach data generate name;  
grunt> dump finaldata 

In this case, we have defined a schema for the tuples.

Bag: Bags are unordered collection of tuples. Since bags are unordered, we cannot reference a tuple in a bag by its position. Bags are also not required to declare a schema. In case of bags, schema describes all the tuples in the bag.

Bag constants are constructed using braces with commas separating the tuples inside bag.

 {('John', 25), ('Nathan', 30)}  

The above constructs a bag with two tuples.

for more info--http://morebigdata.blogspot.in/2012/09/pignalytics-pigs-eat-anything-reading.html

Upvotes: 2

Related Questions