Chhaya Vishwakarma
Chhaya Vishwakarma

Reputation: 1437

Text parsing using PIG

I am new to PIG don't know much about it.How can i parse a text in PIG? to read field's values there is a concept of positional parameter in pig for example $0 corresponds to first field similarly is there any feature like positional parameter that can read entire row.what is RADOOP where exactly it can be used?

Upvotes: 1

Views: 6414

Answers (4)

Guy
Guy

Reputation: 12939

Your question indicates that you would like to have some kind of interactive mode with your data, but that this data has a large volume.

RADOOP is a combination of R and Hadoop and it should be able to provide you with a GUI to run your Big Data through some R Statistical Analysis using Hadoop Scale processing.

In the meanwhile I suggest you to take a look at Google-Refine (http://code.google.com/p/google-refine/), which you can easily download and run your Data Evidence process with it.

With Google-Refine you can easily parse your data, using built-in text, date and numeric functions. You can also use Jython for further enhancing the needed functionality. It can handle a large scale with sampling your data and investigate its features using built-in Facets.See example of Facets

R is also a great tool for Data Evidence, with good sampling and other statistical analysis libraries. But its interface is based on command-line and it is targeted at advanced statistician and analysts, and not for the common user.

Upvotes: 1

zjffdu
zjffdu

Reputation: 28954

I guess you are asking for not tokenize the entire row, just take the entire row as an field, right ?

Then, I think you can use PigStorage('\n'), use '\n' as the field delimiter to treat the entire row as one field.

And I think your "RADOOP" mean hadoop, right ? As a first step, you can run pig in local mode, which means you do not need to install hadoop.

Upvotes: 0

Debaditya
Debaditya

Reputation: 2497

For text parsing , first of all you can read from the tutorials of PIG and the wordcount example.

Links given below :

  1. Pig tutorial

  2. Wordcount example - Read the wordcount example from this link and relate the commands given in the tutorial.

Upvotes: 0

SquareCog
SquareCog

Reputation: 19676

I am not really sure what you are asking. Pig has a number of functions such as TOKENIZE and regex matching / extraction UDFs which can be helpful. Naturally, you can write any text processing code you like in Java or Python, too, and invoke it.

Upvotes: 0

Related Questions