user3082069
user3082069

Reputation: 3

Piping a pipe-delimited flat file into python for use in Pandas and Stats

I have searched a lot, but haven't found an answer to this.

I am trying to pipe in a flat file with data and put into something python read and that I can do analysis with (for instance, perform a t-test).

First, I created a simple pipe delimited flat file:

1|2
3|4
4|5
1|6
2|7
3|8
8|9

and saved it as "simpledata".

Then I created a bash script in nano as

#!/usr/bin/env python

import sys
from scipy import stats 

A = sys.stdin.read()
print A
paired_sample = stats.ttest_rel(A[:,0],A[:,1])
print "The t-statistic is %.3f and the p-value is %.3f." % paired_sample

Then I save the script as pairedttest.sh and run it as

 cat simpledata | pairedttest.sh

The error I get is

TypeError: string indices must be integers, not tuple

Thanks for your help in advance

Upvotes: 0

Views: 890

Answers (1)

dano
dano

Reputation: 94951

Are you trying to call this?:

paired_sample = stats.ttest_rel([1,3,4,1,2,3,8], [2,4,5,6,7,8,9])

If so, you can't do it the way you're trying. A is just a string when you read it from stdin, so you can't index it the way you're trying. You need to build the two lists from the string. The most obvious way is like this:

left = []
right = []
for line in A.splitlines():
    l, r = line.split("|")
    left.append(int(l))
    right.append(int(r))
print left
print right

This will output:

[1, 3, 4, 1, 2, 3, 8]
[2, 4, 5, 6, 7, 8, 9]

So you can call stats.ttest_rel(left, right)

Or to be really clever and make a (nearly impossible to read) one-liner out of it:

z = zip(*[map(int, line.split("|")) for line in A.splitlines()])

This will output:

[(1, 3, 4, 1, 2, 3, 8), (2, 4, 5, 6, 7, 8, 9)]

So you can call stats.ttest_rel(*z)

Upvotes: 1

Related Questions