Reputation: 1849
I have a file which has data as follows
1000000 183:0.6673;2:0.3535;359:0.304;363:0.1835
1000001 92:1.0
1000002 112:1.0
1000003 154435:0.746;30:0.3902;220:0.2803;238:0.2781;232:0.2717
1000004 118:1.0
1000005 157:0.484;25:0.4383;198:0.3033
1000006 277:0.7815;1980:0.4825;146:0.175
1000007 4069:0.6678;2557:0.6104;137:0.4261
1000009 2:1.0
I want to read the file to a pandas dataframe seperated by the multiple delimeters \t, :, ;
I tried
df_user_key_word_org = pd.read_csv(filepath+"user_key_word.txt", sep='\t|:|;', header=None, engine='python')
It gives me the following error.
pandas.errors.ParserError: Error could be due to quotes being ignored when a multi-char delimiter is used.
Why am I getting this error?
So I thought I'll try to use the regex string. But I am not sure how to write a split regex. r'\t|:|;' doesn't work.
What is the best way to read a file to a pandas data frame with multiple delimiters?
Upvotes: 19
Views: 63099
Reputation: 7994
From this question, Handling Variable Number of Columns with Pandas - Python, one workaround to pandas.errors.ParserError: Expected 29 fields in line 11, saw 45.
is let read_csv
know about how many columns in advance.
my_cols = [str(i) for i in range(45)] # create some col names
df_user_key_word_org = pd.read_csv(filepath+"user_key_word.txt",
sep="\s+|;|:",
names=my_cols,
header=None,
engine="python")
# I tested with s = StringIO(text_from_OP) on my computer
Hope this works.
Upvotes: 20