Esteban Smith
Esteban Smith

Reputation: 95

How to create a pandas dataframe from CSV without delimiter (in python)

I would like to create a dataframe from a csv file that contains different columns but no delimiter.It appears that there simply are varying numbers of whitespaces between the column entries.

Also, there are some header rows at the top of the csv that contain readme information without any columns at all.

I am having trouble doing this with pd.read_csv()

Thank you!

The file looks something like this:

This is a header of the textfile.The header has no columns.
This is a header of the textfile.The header has no columns.
This is a header of the textfile.The header has no columns.

...
P-X1-6030-07-A01    368963
P-X1-6030-08-A01    368964
P-X1-6030-09-A01    368965
P-A-1-1011-14-G-01  368967
P-A-1-1014-01-G-05  368968
P-A-1-1017-02-D-01  368969
...

Upvotes: 4

Views: 5913

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210912

Assuming you have the following data file:

This is a header of the textfile.The header has no columns.
This is a header of the textfile.The header has no columns.
This is a header of the textfile.The header has no columns.

P X1 6030-07-A01    368963
P-X1-6030-07-A01    368963
P-X1-6030-08-A01    368964
P-X1-6030-09-A01    368965
P-A-1-1011-14-G-01  368967
P-A-1-1014-01-G-05  368968
P-A-1-1017-02-D-01  368969

Solution: let's use read_fwf() method:

In [192]: fn = r'D:\temp\.data\data.fwf'

In [193]: pd.read_fwf(fn, widths=[19, 7], skiprows=4, header=None)
Out[193]:
                    0       1
0    P X1 6030-07-A01  368963   # NOTE: first column has spaces ...
1    P-X1-6030-07-A01  368963
2    P-X1-6030-08-A01  368964
3    P-X1-6030-09-A01  368965
4  P-A-1-1011-14-G-01  368967
5  P-A-1-1014-01-G-05  368968
6  P-A-1-1017-02-D-01  368969

Upvotes: 4

A.Kot
A.Kot

Reputation: 7913

pd.read_csv(filename, delim_whitespace=True, skiprows = number of rows to skip)

Upvotes: 1

Related Questions