YuliaPro
YuliaPro

Reputation: 305

using PIG to load a file

I am very new to PIG and I am having what feels like a very basic problem. I have a line of code that reads:

A = load 'Sites/trial_clustering/shortdocs/*'
      AS (word1:chararray, word2:chararray, word3:chararray, word4:chararray);

where each file is basically a line of 4 comma separated words. However PIG is not splitting this into the 4 words. When I do dump A, I get: (Money, coins, loans, debt,,,) I have tried googling and I cannot seem to find what format my file needs to be in so that PIG will interpret it properly. Please help!

Upvotes: 11

Views: 26453

Answers (1)

Donald Miner
Donald Miner

Reputation: 39943

Your problem is that Pig, by default, loads files delimited by tab, not comma. What's happening is "Money, coins, loans, debt" are getting stuck in your first column, word1. When you are printing it, you get the illusion that you have multiple columns, but really the first one is filled with your whole line, then the others are null.

To fix this, you should specify PigStorage to load by comma by doing:

A = LOAD '...' USING PigStorage(',') AS (...);

Upvotes: 27

Related Questions