matcha latte
matcha latte

Reputation: 185

How to delete columns without headers in python pandas read_csv

Currently, I have to read the CSV file and set the headers in advance. And then drop the columns which I don't want. Is there any way to do this directly?

# Current Code
columns_name = ['station', 'date', 'observation', 'value', 'other_1', 
'other_2', 'other_3', 'other_4']
del_columns_name = ['other_1', 'other_2', 'other_3', 'other_4']
df =pd.read_csv('filename', names = columns_name)
df.drop(del_columns_name, axis=1)

Upvotes: 1

Views: 2652

Answers (2)

Anton vBR
Anton vBR

Reputation: 18906

I think you might even specify the indexes right away. In this case you are insterested in: [0,1,2,3]. Consider this example which also parses dates.

import pandas as pd

cols = ['station', 'date', 'observation', 'value']

data = '''\
1, 2018-01-01, 1, 1, 1, 1, 1, 1
2, 2018-01-02, 2, 2, 2, 2, 2, 2'''

file = pd.compat.StringIO(data)
df = pd.read_csv(file, names=cols, usecols=[0,1,2,3], parse_dates=[1])

print(df)

Returns:

   station       date  observation  value
0        1 2018-01-01            1      1
1        2 2018-01-02            2      2

Upvotes: 2

jpp
jpp

Reputation: 164623

One way is to use your two lists to resolve the indices and column names required.

Then use usecols and names arguments for pd.read_csv to specify column indices and names respectively.

idx, cols = list(zip(*((i, x) for i, x in enumerate(columns_name) \
                 if x not in del_columns_name)))

df = pd.read_csv('filename', usecols=idx, names=cols, header=None)

As explained in the docs, you should also specify header=None explicitly when no header exists.

Explanation

  • Use a generator expression to iterate columns_name and remove items not in del_columns_name.
  • Use enumerate to extract indices.
  • Use zip to create separate tuples for indices and column names.

Upvotes: 2

Related Questions