A.E
A.E

Reputation: 1013

Pandas read_csv adds unnecessary " " to each row

I have a csv file

(I am showing the first three rows here)

HEIGHT,WEIGHT,AGE,GENDER,SMOKES,ALCOHOL,EXERCISE,TRT,PULSE1,PULSE2,YEAR
173,57,18,2,2,1,2,2,86,88,93
179,58,19,2,2,1,2,1,82,150,93

I am using pandas read_csv to read the file and put them into columns.

Here is my code:

import pandas as pd
import os
path='~/Desktop/pulse.csv'

path=os.path.expanduser(path)
my_data=pd.read_csv(path, index_col=False, header=None, quoting = 3, delimiter=',')
print my_data

The problem is the first and last columns have " before and after the values.

Additionally I can't get rid of the indexes.

It might be making some silly mistake but I thank you for your help in advance

Upvotes: 0

Views: 324

Answers (2)

jezrael
jezrael

Reputation: 862581

Final solution - use replace with converting to ints and for remove " from columns names use strip:

df = pd.read_csv('pulse.csv', quoting=3)

df = df.replace('"','', regex=True).astype(int)
df.columns = df.columns.str.strip('"')
print (df.head())

   HEIGHT  WEIGHT  AGE  GENDER  SMOKES  ALCOHOL  EXERCISE  TRT  PULSE1  \
0     173      57   18       2       2        1         2    2      86   
1     179      58   19       2       2        1         2    1      82   
2     167      62   18       2       2        1         1    1      96   
3     195      84   18       1       2        1         1    2      71   
4     173      64   18       2       2        1         3    2      90   

   PULSE2  YEAR  
0      88    93  
1     150    93  
2     176    93  
3      73    93  
4      88    93  

index_col=False means force not read first column to index, but dataframe always need some index, so is added default - 0,1,2.... So here can be omit.

header=None should be removed because it force dont read first row (header of csv) to columns of DataFrame. Then also first row of data is header and numeric values are converted to strings.

delimiter=',' should be removed too, because it is same as sep=',' what is default parameter.

Upvotes: 2

smundlay
smundlay

Reputation: 155

@jezrael is right - a pandas dataframe will always add indices. It's necessary.

try something like df[0] = df[0].str.strip() replacing zero with the last column.

before you do so, convert your csv to a dataframe - pd.DataFrame.from_csv(path)

Upvotes: 0

Related Questions