Reputation: 197
This is my code to open file:
df = pd.read_csv(path_df, delimiter='|')
I get error: Error tokenizing data. C error: Expected 5 fields in line 13571, saw 6
When I check this particular line, I see that there was a misprint and there were 3 signs "|||" instead of one. I would prefer treat double and triple signs as one. Probably, there is other solution.
How can I solve this problem?
Upvotes: 2
Views: 77
Reputation: 2689
my suspicion is that this would be caused by the file being written incorrectly, if the field was supposed to contain the value "|" but somehow was written incorrectly (csv would normally write a line like that as 1|2|3|"|"|5
), but if it was mistakenly written without any escaping it would cause this issue.
In that case I don't think you can solve this with pandas, because the issue is badly formed csv.
If it's a one off you can just edit the file first, perhaps to replace all "|||" with "||" - but again that could have unintended consequences. I've had this trouble before and I don't think there's a better way than manually editing the file (at least pandas gives you the line number to look at!)
On the other hand, if it really is just a repeated character misprint, then the other answer will work fine.
Upvotes: 0
Reputation: 1436
Another way to define a delimiter is using sep
while reading a CSV in pandas.
df = pd.read_csv(path_df, sep=r'\|+', engine='python')
Whenever you notice 'C error', it requires the forced use of python engine by specifying engine='python'
in the arguments.
Upvotes: 3
Reputation: 862661
Use regex separator [|]+
- one or more |
:
import pandas as pd
temp=u"""a|b|c
ss|||s|s
t|g|e"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep="[|]+",engine='python')
print (df)
a b c
0 ss s s
1 t g e
Upvotes: 6