wiedzminYo
wiedzminYo

Reputation: 561

Don't understand output of Pandas.Series.from_csv()

I have three txt files with data,4 columns of numbers.I need to load them to one data frame (dimension [3,n] where n is lenght of column).Becouse I need only one column from each file I decided to use Series.from_csv() function but I cannot comprehend the output. I have write this code:

names = glob.glob("*.txt")
for i in names:
    rank = pd.Series.from_csv(i,sep=" ",index_col = 3)
    print rank

And this print me one column of my data(thats good) but also one column filled entire with zeros like this:

0.039157    0
0.039001    0
0.038524    0
0.038579    0
0.038385    0

What I find more bizzare is when I use

rank = pd.Series.from_csv(i,sep=" ",index_col = 3).values

I got this:

[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]

So its mean that this zeros were values read from files? Then what is the first column from from before?I have tried many method,but I have failed to understand this.

Upvotes: 2

Views: 182

Answers (1)

jezrael
jezrael

Reputation: 862511

I think you can use more common read_csv with delim_whitespace=True and usecols for filtering column, first append all DataFrames to list dfs and then use concat:

dfs = []
names = glob.glob("*.txt")
for i in names:
    rank = pd.read_csv(i,delim_whitespace=True,usecols=[3])
    print rank
    dfs.append(rank)

df = pd.concat(dfs, axis=1)

Or with sep='\s+' - separator is arbitrary whitespace:

dfs = []
names = glob.glob("*.txt")
for i in names:
    rank = pd.read_csv(i,sep='\s+',usecols=[3])
    print rank
    dfs.append(rank)

df = pd.concat(dfs, axis=1)

You can use also list comprehension:

files = glob.glob("*.txt")
dfs = [pd.read_csv(fp, delim_whitespace=True,usecols=[3]) for fp in files]
df = pd.concat(dfs, axis=1) 

Upvotes: 2

Related Questions