Reputation: 173
I'm trying to import csv with pandas read_csv and can't get the lines containing the following snippet to work:
"","",""BSF" code - Intermittant, see notes",""
I am able to get pass it via with the options error_bad_lines=False, low_memory=False, engine='c'
. However it should be possible to parse them correctly. I'm not good with regular expressions so I didn't try using engine='python', sep=regex
yet. Thanks for any help.
Upvotes: 0
Views: 290
Reputation: 2306
Well, that's quite a hard one ... given that all fields are quoted you could use a regex to only use , followed and preceded by " as a separator:
data = pd.read_csv(filename,sep=r'(?<="),(?=")',quotechar='"')
However, you will still end up with quotes around all fields, but you could fix this by applying
data = data.applymap(lambda s:s[1:-1])
Upvotes: 1