Reputation: 345
Since the columns and list of usecols are different, it spits the error
"ValueError" Usecols do not match names.
How can I 'usecol' if columns exist in csv?
csv sample:
df.csv
AB,CD,EF,GH
foo,20160101,a,1
foo,20160102,a,3
foo,20160103,a,5
reading csv:
import pandas as pd
df = pd.read_csv('df.csv',
header=0,usecols=["AB", "CD", "IJ"])
This is what I'd like to get:
df
date AB CD
2016-01-01 a 1
2016-01-02 a 3
2016-01-03 a 5
Ignored "IJ".
Upvotes: 4
Views: 11040
Reputation: 1375
Use lambda
in usecols
to skip columns that not in csv
:
import pandas as pd
from io import StringIO
txt = """AB,CD,EF,GH
foo,20160101,a,1
foo,20160102,a,3
foo,20160103,a,5"""
usecols = ['AB', 'CD', 'IJ']
df = pd.read_csv(StringIO(txt), usecols=lambda c: c in set(usecols))
print(df)
AB CD
0 foo 20160101
1 foo 20160102
2 foo 20160103
An explanation can be found in the pandas docs:
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be lambda x: x.upper() in ['AAA', 'BBB', 'DDD']. Using this parameter results in much faster parsing time and lower memory usage.
Upvotes: 9
Reputation: 294488
import csv
normally
import pandas as pd
from io import StringIO
txt = """AB,CD,EF,GH
foo,20160101,a,1
foo,20160102,a,3
foo,20160103,a,5"""
df = pd.read_csv(StringIO(txt))
print(df)
AB CD EF GH
0 foo 20160101 a 1
1 foo 20160102 a 3
2 foo 20160103 a 5
reindex
with intersection
usecols = ['AB', 'CD', 'IJ']
df.reindex_axis(df.columns.intersection(usecols), 1)
AB CD
0 foo 20160101
1 foo 20160102
2 foo 20160103
Upvotes: 0