Reputation: 1013
I have a csv
file
(I am showing the first three rows here)
HEIGHT,WEIGHT,AGE,GENDER,SMOKES,ALCOHOL,EXERCISE,TRT,PULSE1,PULSE2,YEAR
173,57,18,2,2,1,2,2,86,88,93
179,58,19,2,2,1,2,1,82,150,93
I am using pandas read_csv
to read the file and put them into columns.
Here is my code:
import pandas as pd
import os
path='~/Desktop/pulse.csv'
path=os.path.expanduser(path)
my_data=pd.read_csv(path, index_col=False, header=None, quoting = 3, delimiter=',')
print my_data
The problem is the first and last columns have " before and after the values.
Additionally I can't get rid of the indexes.
It might be making some silly mistake but I thank you for your help in advance
Upvotes: 0
Views: 324
Reputation: 862581
Final solution - use replace
with converting to int
s and for remove "
from columns names use strip
:
df = pd.read_csv('pulse.csv', quoting=3)
df = df.replace('"','', regex=True).astype(int)
df.columns = df.columns.str.strip('"')
print (df.head())
HEIGHT WEIGHT AGE GENDER SMOKES ALCOHOL EXERCISE TRT PULSE1 \
0 173 57 18 2 2 1 2 2 86
1 179 58 19 2 2 1 2 1 82
2 167 62 18 2 2 1 1 1 96
3 195 84 18 1 2 1 1 2 71
4 173 64 18 2 2 1 3 2 90
PULSE2 YEAR
0 88 93
1 150 93
2 176 93
3 73 93
4 88 93
index_col=False
means force not read first column to index, but dataframe always need some index, so is added default - 0,1,2...
. So here can be omit.
header=None
should be removed because it force dont read first row (header of csv) to columns of DataFrame
. Then also first row of data is header and numeric values are converted to strings.
delimiter=','
should be removed too, because it is same as sep=','
what is default parameter.
Upvotes: 2
Reputation: 155
@jezrael is right - a pandas dataframe will always add indices. It's necessary.
try something like df[0] = df[0].str.strip()
replacing zero with the last column.
before you do so, convert your csv to a dataframe - pd.DataFrame.from_csv(path)
Upvotes: 0