BeeHunter
BeeHunter

Reputation: 13

Pandas.read_csv not reading full header

I have a csv file that has positions and velocities of particles saved like this:

x, y, z, vx, vy, vz
-0.960, 0.870, -0.490, 962.17, -566.10, 713.40
1.450, 0.777, 2.270, -786.27, 63.31, -441.00
-3.350, -1.640, 1.313, 879.20, 637.76, -556.24
-0.504, 2.970, -0.278, 613.22, -717.32, 557.02
0.338, 0.220, 0.090, -927.18, -778.77, -443.05
...

I'm trying to read this file and save it as a Pandas dataframe in a script with read_csv. But I would get errors when calling any column except the first one

AttributeError: 'DataFrame' object has no attribute 'y'

I would never get the error for the 'x' column, so I wrote a snippet to see if I could figure out where the reading error was stemming from.

import pandas as pd
data = pd.read_csv('snap.csv')
print data
print data.x
print data.y

The console correctly prints out

          x      y      z       vx       vy       vz       
0    -0.960  0.870 -0.490   962.17  -566.10   713.40   
1     1.450  0.777  2.270  -786.27    63.31  -441.00   
2    -3.350 -1.640  1.313   879.20   637.76  -556.24  
3    -0.504  2.970 -0.278   613.22  -717.32   557.02  
4     0.338  0.220  0.090  -927.18  -778.77  -443.05 
...

meaning it is assigning the columns the correct names. Then

0      -0.960
1       1.450
2      -3.350
3      -0.504
4       0.338  
...

showing it can take one of the columns out correctly. But then it throws the error again when trying to print the second column

AttributeError: 'DataFrame' object has no attribute 'y'

I then looped through data.itertuples() to print the first row individually in order to see what that looked like, and it confirmed that the names were only being assigned to the first column and none of the others.

Pandas(Index=0, x=-0.96, _2=0.87, _3=-0.49, _4=962.17, _5=-566.1, _6=713.4)

There aren't any other problems with the data. The values all correspond to the right index. It's just that the names are not being assigned correctly and only the first column can be called by name. I tried putting single quotes around each column name, and that shows the exact same errors. I know there are ways I might be able to work around this such as assigning the names in the read_csv function, but I'm curious as to what the issue could actually be so as to avoid having this happen again.

Upvotes: 1

Views: 5703

Answers (2)

asimo
asimo

Reputation: 2500

df = pd.read_csv("snap.csv",names =["x", "y", "z", "vx", "vy", "vz"])

Upvotes: 0

Kenpachi
Kenpachi

Reputation: 182

Try declaring column names when you create the data frame.

df = pd.DataFrame(pd.read_csv(“file.csv”), columns=[“x”, “y”, “z”, “vx”, “vy”, “vz”])

Upvotes: 1

Related Questions