Tiberius
Tiberius

Reputation: 145

Having trouble removing headers when using pd.read_csv

I have a .csv that contains contains column headers and is displayed below. I need to suppress the column labeling when I ingest the file as a data frame.

date,color,id,zip,weight,height,locale
11/25/2013,Blue,122468,1417464,3546600,254,7

When I issue the following command:

 df = pd.read_csv('c:/temp1/test_csv.csv', usecols=[4,5], names = ["zip","weight"], header = 0, nrows=10)

I get:

zip               weight
0   1417464       3546600

I have tried various manipulations of header=True and header=0. If I don't use header=0, then the columns will all print out on top of the rows like so:

    zip           weight
    height        locale
0   1417464       3546600

I have tried skiprows= 0 and 1 but neither removes the headers. However, the command works by skipping the line specified.

I could really use some additional insight or a solve. Thanks in advance for any assistance you could provide.

Tiberius

Upvotes: 6

Views: 13562

Answers (3)

jrovegno
jrovegno

Reputation: 719

Using the example of @jezrael, if you want to skip the header and suppress de column labeling:

import pandas as pd
import numpy as np
import io

temp=u"""date,color,id,zip,weight,height,locale
11/25/2013,Blue,122468,1417464,3546600,254,7"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), usecols=[4,5], header=None, skiprows=1)
print df
         4    5
0  3546600  254

Upvotes: 7

jezrael
jezrael

Reputation: 862661

I think you are right.

So you can change column names to a and b:

import pandas as pd
import numpy as np
import io

temp=u"""date,color,id,zip,weight,height,locale
11/25/2013,Blue,122468,1417464,3546600,254,7"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), usecols=[4,5], names = ["a","b"], header = 0 , nrows=10)
print df
         a    b
0  3546600  254

Now these columns have new names instead of weight and height.

df = pd.read_csv(io.StringIO(temp), usecols=[4,5], header = 0 , nrows=10)
print df
    weight  height
0  3546600     254

You can check docs read_csv (bold by me):

header : int, list of ints, default ‘infer’

Row number(s) to use as the column names, and the start of the data. Defaults to 0 if no names passed, otherwise None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns E.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example are skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.

Upvotes: 0

Justin O Barber
Justin O Barber

Reputation: 11591

I'm not sure I entirely understand why you want to remove the headers, but you could comment out the header line as follows as long as you don't have any other rows that begin with 'd':

>>> df = pd.read_csv('test.csv', usecols=[3,4], header=None, comment='d')  # comments out lines beginning with 'date,color' . . .
>>> df
         3        4
0  1417464  3546600

It would be better to comment out the line in the csv file with the crosshatch character (#) and then use the same approach (again, as long as you have not commented out any other lines with a crosshatch):

>>> df = pd.read_csv('test.csv', usecols=[3,4], header=None, comment='#')   # comments out lines with #
>>> df
         3        4
0  1417464  3546600

Upvotes: 0

Related Questions