Reputation: 95
I would like to create a dataframe from a csv file that contains different columns but no delimiter.It appears that there simply are varying numbers of whitespaces between the column entries.
Also, there are some header rows at the top of the csv that contain readme information without any columns at all.
I am having trouble doing this with pd.read_csv()
Thank you!
The file looks something like this:
This is a header of the textfile.The header has no columns.
This is a header of the textfile.The header has no columns.
This is a header of the textfile.The header has no columns.
...
P-X1-6030-07-A01 368963
P-X1-6030-08-A01 368964
P-X1-6030-09-A01 368965
P-A-1-1011-14-G-01 368967
P-A-1-1014-01-G-05 368968
P-A-1-1017-02-D-01 368969
...
Upvotes: 4
Views: 5913
Reputation: 210912
Assuming you have the following data file:
This is a header of the textfile.The header has no columns.
This is a header of the textfile.The header has no columns.
This is a header of the textfile.The header has no columns.
P X1 6030-07-A01 368963
P-X1-6030-07-A01 368963
P-X1-6030-08-A01 368964
P-X1-6030-09-A01 368965
P-A-1-1011-14-G-01 368967
P-A-1-1014-01-G-05 368968
P-A-1-1017-02-D-01 368969
Solution: let's use read_fwf() method:
In [192]: fn = r'D:\temp\.data\data.fwf'
In [193]: pd.read_fwf(fn, widths=[19, 7], skiprows=4, header=None)
Out[193]:
0 1
0 P X1 6030-07-A01 368963 # NOTE: first column has spaces ...
1 P-X1-6030-07-A01 368963
2 P-X1-6030-08-A01 368964
3 P-X1-6030-09-A01 368965
4 P-A-1-1011-14-G-01 368967
5 P-A-1-1014-01-G-05 368968
6 P-A-1-1017-02-D-01 368969
Upvotes: 4
Reputation: 7913
pd.read_csv(filename, delim_whitespace=True, skiprows = number of rows to skip)
Upvotes: 1