Reputation: 2146
Here is a sample csv:
| Header A | | Unnamed: 1 | Header D |
|-----------|------|------------|-----------|
| a1 | b1 | c1 | d1 |
| a2 | b2 | c2 | d2 |
If I import it with pandas.read_csv
, it turns into this:
Header A Unnamed: 1 Unnamed: 1.1 Header D
0 a1 b1 c1 d1
1 a2 b2 c2 d2
My goal is dropping all the columns with empty headers, in this case the second column, but I cannot use the assigned column names by pandas to filter them, because there might also be non-empty columns starting with Unnamed
, like the third column in the example.
Columns are not known before hand, so I do not have any control over them.
I have tried the following args with read_csv
, but have not had any luck with them:
prefix
: it just does not work!usecols
: Empty headers already have a name when they are passed to usecols
, which makes it unusable to me.I have looked at some other answers on SO, like the ones below, but none of them cover my case:
How to get rid of `Unnamed:` column in a pandas dataframe
Remove Unnamed columns in pandas dataframe
Upvotes: 0
Views: 3225
Reputation: 5735
Upvotes: 0
Reputation: 13175
The only way I can think of is to "peek" at the headers beforehand and get the indices of non-empty headers. Then it's not a case of dropping them, but not including them in the original df.
import csv
import pandas as pd
with open('test.csv') as infile:
reader = csv.reader(infile)
headers = next(reader)
header_indices = [i for i, item in enumerate(headers) if item]
df = pd.read_csv('test.csv', usecols=header_indices)
Upvotes: 2