Reputation: 1437
I am new to PIG don't know much about it.How can i parse a text in PIG? to read field's values there is a concept of positional parameter in pig for example $0 corresponds to first field similarly is there any feature like positional parameter that can read entire row.what is RADOOP where exactly it can be used?
Upvotes: 1
Views: 6414
Reputation: 12939
Your question indicates that you would like to have some kind of interactive mode with your data, but that this data has a large volume.
RADOOP is a combination of R and Hadoop and it should be able to provide you with a GUI to run your Big Data through some R Statistical Analysis using Hadoop Scale processing.
In the meanwhile I suggest you to take a look at Google-Refine (http://code.google.com/p/google-refine/), which you can easily download and run your Data Evidence process with it.
With Google-Refine you can easily parse your data, using built-in text, date and numeric functions. You can also use Jython for further enhancing the needed functionality. It can handle a large scale with sampling your data and investigate its features using built-in Facets.
R is also a great tool for Data Evidence, with good sampling and other statistical analysis libraries. But its interface is based on command-line and it is targeted at advanced statistician and analysts, and not for the common user.
Upvotes: 1
Reputation: 28954
I guess you are asking for not tokenize the entire row, just take the entire row as an field, right ?
Then, I think you can use PigStorage('\n'), use '\n' as the field delimiter to treat the entire row as one field.
And I think your "RADOOP" mean hadoop, right ? As a first step, you can run pig in local mode, which means you do not need to install hadoop.
Upvotes: 0
Reputation: 2497
For text parsing , first of all you can read from the tutorials of PIG and the wordcount example.
Links given below :
Wordcount example - Read the wordcount example from this link and relate the commands given in the tutorial.
Upvotes: 0
Reputation: 19676
I am not really sure what you are asking. Pig has a number of functions such as TOKENIZE and regex matching / extraction UDFs which can be helpful. Naturally, you can write any text processing code you like in Java or Python, too, and invoke it.
Upvotes: 0