Reputation: 32336
The following code shows an error. But it works if I remove usercols parameter.
from StringIO import StringIO
import pandas as pd
u_cols = ['page_id','web_id']
audit_trail = StringIO('''
page_id | web_id
3|0
7|3
11|4
15|5
19|6
''')
df = pd.read_csv(audit_trail, sep="|", usecols = u_cols )
ValueError: Passed header names mismatches usecols
I need to use u_cols list because the column headings are being generated dynamically.
Upvotes: 3
Views: 16706
Reputation: 108
If someone else tumbles into this error, this is what just happened to me: I used parse_dates=[[0,1]]
to merge and parse dates from two columns and got this error. So the names
parameter should contain the same number of columns as the original csv, so I just added and extra empty string in the list: names=['column1','','column2',...]
.
Upvotes: 0
Reputation: 32336
"names" should be used instead of "usecolmns"
from StringIO import StringIO
import pandas as pd
u_cols = ['page_id','web_id']
audit_trail = StringIO('''
page_id | web_id
3|0
7|3
11|4
15|5
19|6
''')
df11 = pd.read_csv(audit_trail, sep="|", names = u_cols )
Upvotes: 7
Reputation: 4051
This is because of the white space next to the | seperator. When you run pd.read_csv(audit_trail,sep="|")
you actually have the columns ['page_id(whitespace)','(whitespace)web_id'] instead of ['page_id','web_id'].
I would suggest passing the following regex pattern as your seperator \s*\|\s*
, which will remove any whitespace around the | seperator. Here is the full solution...
u_cols = ['page_id','web_id']
"""page_id | web_id
3|0
7|3
11|4
15|5
19|6"""
df = pd.read_csv(StringIO(s),sep="\s*\|\s*",usecols = u_cols)
output
page_id web_id
0 3 0
1 7 3
2 11 4
3 15 5
4 19 6
Upvotes: 3