Reputation: 12826
I have a text file that is formatted this way:
A00 0010 00000
A001 0011 00000
A00911 0019 00000
A0100 0020 10000
I want to read this file into a DataFrame. So I tried:
import pandas as pd
path = *file path*
df = pd.read_csv(path, sep = '\t', header = None)
What I got was a DataFrame with 4 rows and one column.
0
0 A00 0010 00000
1 A001 0011 00000
2 A00911 0019 00000
3 A0100 0020 10000
[4 rows x 1 columns]
This is because the values are not seperated by "\t". The number of spaces between the columns vary in each row depending on the length of the string.
The desired DataFrame should have four rows and three columns.
0 1 2
0 A000 0010 00000
1 A001 0011 00000
2 A009 0019 00000
3 A0100 0020 10000
[4 rows x 3 columns]
Upvotes: 1
Views: 252
Reputation: 29711
You could supply delim_whitespace=True
along with dtype=str
to preserve the dtypes args in read_csv
, like:
df = pd.read_csv(path, delim_whitespace=True, header=None, dtype=str)
df
Upvotes: 5
Reputation: 1980
try to use regex in the "sep" command
df = pd.read_csv(path, sep = ' +', header = None)
Upvotes: 1