Lcy
Lcy

Reputation: 345

Pandas - Usecols when columns exist in csv

Since the columns and list of usecols are different, it spits the error

"ValueError" Usecols do not match names.

How can I 'usecol' if columns exist in csv?

csv sample:

df.csv

AB,CD,EF,GH
foo,20160101,a,1
foo,20160102,a,3
foo,20160103,a,5

reading csv:

import pandas as pd


df = pd.read_csv('df.csv', 
    header=0,usecols=["AB", "CD", "IJ"])

This is what I'd like to get:

df

date       AB   CD
2016-01-01  a    1
2016-01-02  a    3
2016-01-03  a    5

Ignored "IJ".

Upvotes: 4

Views: 11040

Answers (2)

Alex
Alex

Reputation: 1375

Use lambda in usecols to skip columns that not in csv:

import pandas as pd
from io import StringIO

txt = """AB,CD,EF,GH
foo,20160101,a,1
foo,20160102,a,3
foo,20160103,a,5"""

usecols = ['AB', 'CD', 'IJ']

df = pd.read_csv(StringIO(txt), usecols=lambda c: c in set(usecols))

print(df)

    AB        CD
0  foo  20160101
1  foo  20160102
2  foo  20160103

An explanation can be found in the pandas docs:

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be lambda x: x.upper() in ['AAA', 'BBB', 'DDD']. Using this parameter results in much faster parsing time and lower memory usage.

Upvotes: 9

piRSquared
piRSquared

Reputation: 294488

import csv normally

import pandas as pd
from io import StringIO

txt = """AB,CD,EF,GH
foo,20160101,a,1
foo,20160102,a,3
foo,20160103,a,5"""

df = pd.read_csv(StringIO(txt))

print(df)

    AB        CD EF  GH
0  foo  20160101  a   1
1  foo  20160102  a   3
2  foo  20160103  a   5

reindex with intersection

usecols = ['AB', 'CD', 'IJ']
df.reindex_axis(df.columns.intersection(usecols), 1)

    AB        CD
0  foo  20160101
1  foo  20160102
2  foo  20160103

​

Upvotes: 0

Related Questions