Reputation: 1490
I'm trying to run a an embedded Pig script (embeded in Python) where I need to take the output/result of the script and feed it back into script as the input. I'm sure there is an easy way to do this but all the examples seem overly simplistic and are using one column examples.
My input looks like this: networkMap.csv:
NodeH,4,-0.4
NodeH,5,0.2
NodeO,6,0.1
Link,W_1_4,0.2,1,4
Link,W_1_5,-0.3,1,5
Link,W_2_4,0.4,2,4
Link,W_2_5,0.1,2,5
Link,W_3_4,-0.5,3,4
Link,W_3_5,-0.2,3,5
Link,W_4_6,-0.3,4,6
Link,W_5_6,-0.2,5,6
LR,LR,0.9
Target,Target,1
And lets take a super simple example of what I want to do striping out all of the application logic to just focus on the input/output problem:
#!/usr/bin/python
from org.apache.pig.scripting import *
P = Pig.compile("""
A = LOAD '$input' using PigStorage(',') AS (type:chararray, name:chararray, val:double,iName:chararray,jName:chararray);
STORE A INTO '$outFile' USING PigStorage (',');
""")
params = { 'input': 'networkMap.csv'}
for i in range(2):
outDir = "out_" + str(i + 1)
inputString = ""
params["outFile"] = "out_" + str(i + 1)
bound = P.bind(params)
stats = bound.runSingle()
if not stats.isSuccessful():
raise 'failed'
params["input"] = stats.result("Output1")
I was hoping that I could just say input = output but that doesn't work. I've also tried:
input = "";
iter = stats.result("A").iterator()
while iter.hasNext():
tuple = iter.next()
input = input + "(" +tuple.toDelimitedString(",") + ")"
params["input"] = input
This did push the output back into the input but then the LOAD function couldn't read it. since it looked like one big reccord -
A = LOAD '(NodeI,1,1.0,,)(NodeI,2,0.0,,)(NodeI,3,1.0,,)(NodeH,4,-0.4,,)(NodeH,5,0.2,,)(NodeO,6,0.1,,)(Link,W_1_4,0.2,1,4)(Link,W_1_5,-0.3,1,5)(Link,W_2_4,0.4,2,4)(Link,W_2_5,0.1,2,5)(Link,W_3_4,-0.5,3,4)(Link,W_3_5,-0.2,3,5)(Link,W_4_6,-0.3,4,6)(Link,W_5_6,-0.2,5,6)(LR,LR,0.9,,)(Target,Target,1.0,,)' using PigStorage(',') AS (type:chararray, name:chararray, val:double,iName:chararray,jName:chararray);
I'm sure I am missing some simple way of doing this.
Upvotes: 0
Views: 218
Reputation: 5801
Quick answer: change
params["input"] = stats.result("Output1")
to
params["input"] = params["outFile"]
Explanation: Remember, your params array is for parameter substitution within your Pig script. That's why your next LOAD statement looked the way it did. You took the output of the previous run and said "take these results, put them into a string, and then interpret this string as the filename of the input data".
You are almost there. You have two elements in your params dictionary: input and outFile. Your script LOADs from input and STOREs into outFile. So after you have run the script, set input = outFile. Then your next iteration will LOAD from outFile. Just be sure to specify a new outFile, or you will be unable to STORE because the directory will already exist.
Upvotes: 1