kaveh
kaveh

Reputation: 2146

Drop columns with no header on csv import with pandas

Here is a sample csv:

|  Header A |      | Unnamed: 1 |  Header D |
|-----------|------|------------|-----------|
| a1        | b1   | c1         | d1        |
| a2        | b2   | c2         | d2        |

If I import it with pandas.read_csv, it turns into this:

  Header A Unnamed: 1 Unnamed: 1.1 Header D
0      a1         b1           c1       d1
1      a2         b2           c2       d2

My goal is dropping all the columns with empty headers, in this case the second column, but I cannot use the assigned column names by pandas to filter them, because there might also be non-empty columns starting with Unnamed, like the third column in the example.

Columns are not known before hand, so I do not have any control over them.

I have tried the following args with read_csv, but have not had any luck with them:

I have looked at some other answers on SO, like the ones below, but none of them cover my case:

How to get rid of `Unnamed:` column in a pandas dataframe

Remove Unnamed columns in pandas dataframe

Upvotes: 0

Views: 3225

Answers (2)

Lior Cohen
Lior Cohen

Reputation: 5735

  1. Read your columns to list with df.columns
  2. create a tf_list with True/False based on your logic (search for None, Unnamed etc)
  3. filter_df = df.loc[:, tf_list]

Upvotes: 0

roganjosh
roganjosh

Reputation: 13175

The only way I can think of is to "peek" at the headers beforehand and get the indices of non-empty headers. Then it's not a case of dropping them, but not including them in the original df.

import csv

import pandas as pd

with open('test.csv') as infile:
    reader = csv.reader(infile)
    headers = next(reader)

header_indices = [i for i, item in enumerate(headers) if item]

df = pd.read_csv('test.csv', usecols=header_indices)

Upvotes: 2

Related Questions