shantanuo
shantanuo

Reputation: 32336

value error while matching column names

The following code shows an error. But it works if I remove usercols parameter.

from StringIO import StringIO
import pandas as pd

u_cols = ['page_id','web_id']
audit_trail = StringIO('''
page_id | web_id
3|0
7|3
11|4
15|5
19|6
''')

df = pd.read_csv(audit_trail, sep="|", usecols = u_cols  )

ValueError: Passed header names mismatches usecols

I need to use u_cols list because the column headings are being generated dynamically.

Upvotes: 3

Views: 16706

Answers (3)

gmolnar
gmolnar

Reputation: 108

If someone else tumbles into this error, this is what just happened to me: I used parse_dates=[[0,1]] to merge and parse dates from two columns and got this error. So the names parameter should contain the same number of columns as the original csv, so I just added and extra empty string in the list: names=['column1','','column2',...].

Upvotes: 0

shantanuo
shantanuo

Reputation: 32336

"names" should be used instead of "usecolmns"

from StringIO import StringIO
import pandas as pd

u_cols = ['page_id','web_id']
audit_trail = StringIO('''
page_id | web_id
3|0
7|3
11|4
15|5
19|6
''')

df11 = pd.read_csv(audit_trail, sep="|", names = u_cols  )

Upvotes: 7

ZJS
ZJS

Reputation: 4051

This is because of the white space next to the | seperator. When you run pd.read_csv(audit_trail,sep="|") you actually have the columns ['page_id(whitespace)','(whitespace)web_id'] instead of ['page_id','web_id'].

I would suggest passing the following regex pattern as your seperator \s*\|\s*, which will remove any whitespace around the | seperator. Here is the full solution...

u_cols = ['page_id','web_id']

"""page_id | web_id
3|0
7|3
11|4
15|5
19|6"""

df = pd.read_csv(StringIO(s),sep="\s*\|\s*",usecols = u_cols)

output

   page_id  web_id
0        3       0
1        7       3
2       11       4
3       15       5
4       19       6

Upvotes: 3

Related Questions