Reputation: 55
I am working with NCEI marine data which are .dat files without headers, using python (https://www.ncei.noaa.gov/data/marine/icoads3.0/ for the files) They look like:
166210151200 4962 35378 1306 101134 NL 1585 26 165 17796730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000003002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0493800N 102600E493700N 2 1TENERIFE 0 21662101512 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015WZW 7.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES ZZO MOU (?) KOELTE 00000000CLIWOC VERSION 1.0
166210161300 4907 35215 1306 101134 NL 1585 26 165 17797730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000013002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0490400N 84800E 1 1TENERIFE 0 21662101612 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015ZW 1/2 N 18.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES ZZO MOU KOELTE 00000000CLIWOC VERSION 1.0
166210171300 4812 35000 1306 101134 NL 1695 26 165 17680730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000023002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0483000N 63900E480700N 2 1TENERIFE 0 21662101712 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015ZWTW 15.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES ZTO MOU KOELTE MOOI WEER 00000000CLIWOC VERSION 1.0
166210181300 4758 34925 1306 101134 NL 1695 26 165 17670730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000033002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0474100N 55400E473500N 2 1TENERIFE 0 21662101812 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015ZWTW 11.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES ZTO MOU KOELTE 'ENN MOUT'? REGEN 01000000CLIWOC VERSION 1.0
166210191300 4757 34795 1306 101134 NL 1805 67 165 17672730133 5 0 2FF11FF11AAAAAAAAAAAA 98150000043002199 0 NAN NATIONAAL ARCHIEF OF THE NETHERLANDS DEN HAAG NEDERLAND 1.11.01.01 1229 AANW 112 AAN_1229_112 DUTCH 0473400N 43600E 1 1TENERIFE 0 21662101912 3 VM 8UNKNOWN MAARSEVEEN DUTCH VOC M. GERRITSZ. BOOS OPPERSTUURMAN ROTTERDAM BATAVIA 0 0977.216621015W/Z 14.00 UNKNOWN UNKNOWN UNKNOWN360 DEGREES Z MARSZEILSKOELTE, TOUPKOULTE REGEN 01000000CLIWOC VERSION 1.0
These are tab delimited files which I have been importing using
data = pd.read_table('file.dat', header=None)
Which imports the data as x rows with a single column containing all the data. In the single column each datum is separated by white space.
Is there a way in which I can import this data into columns or read the data variable and split each row into columns based on the white space. I thought that is what I was doing with the read.table function. The full dataset is large so I would prefer a method to import them over having to process them after.
Upvotes: 2
Views: 543
Reputation: 49804
I think what you need is Fixed Width Formatted:
Code:
df = pd.read_fwf('IMMA.dat', header=None)
print(df.dtypes)
Results:
[17 rows x 66 columns]
0 int64
1 int64
2 int64
3 int64
...
61 object
62 object
63 object
64 object
65 float64
dtype: object
Upvotes: 1