Reputation: 1801
I have a .DAT file with rows like these:
2016 01 01 00 00 19 348 2.05 7 618.4
2016 01 01 00 01 19 351 2.05 7 618.4
2016 01 01 00 02 18 0 2.05 7 618.4
2016 01 01 00 03 17 353 2.05 7 618.4
2016 01 01 00 04 19 346 2.02 7 618.4
2016 01 01 00 05 20 345 2.00 7 618.4
2016 01 01 00 06 22 348 1.97 7 618.4
.......
the data format is:
year month day hour minute(HST) wind_speed(kts) wind_direction(dec) temperature(C) relative_humidity(%) pressure
I want to import the .DAT file into a pandas dataframe, with the year-month-day-hour-minute as a single index column, and the rest of the values as separate columns.
Any suggestions?
Thanks !!
Upvotes: 3
Views: 12177
Reputation: 210982
Here is a bit faster version:
In [86]: df = (pd.read_csv(fn, sep='\s+', header=None,
...: parse_dates={'Date':[0,1,2,3,4]},
...: date_parser=lambda x: pd.to_datetime(x, format='%Y %m %d %H %M'))
...: .set_index('Date'))
...:
In [87]: df
Out[87]:
5 6 7 8 9
Date
2016-01-01 00:00:00 19 348 2.05 7 618.4
2016-01-01 00:01:00 19 351 2.05 7 618.4
2016-01-01 00:02:00 18 0 2.05 7 618.4
2016-01-01 00:03:00 17 353 2.05 7 618.4
2016-01-01 00:04:00 19 346 2.02 7 618.4
2016-01-01 00:05:00 20 345 2.00 7 618.4
2016-01-01 00:06:00 22 348 1.97 7 618.4
In [88]: cols_str = 'wind_speed(kts) wind_direction(dec) temperature(C) relative_humidity(%) pressure'
...: cols = cols_str.split()
...:
In [89]: cols
Out[89]:
['wind_speed(kts)',
'wind_direction(dec)',
'temperature(C)',
'relative_humidity(%)',
'pressure']
In [90]: df.columns = cols
In [91]: df
Out[91]:
wind_speed(kts) wind_direction(dec) temperature(C) relative_humidity(%) pressure
Date
2016-01-01 00:00:00 19 348 2.05 7 618.4
2016-01-01 00:01:00 19 351 2.05 7 618.4
2016-01-01 00:02:00 18 0 2.05 7 618.4
2016-01-01 00:03:00 17 353 2.05 7 618.4
2016-01-01 00:04:00 19 346 2.02 7 618.4
2016-01-01 00:05:00 20 345 2.00 7 618.4
2016-01-01 00:06:00 22 348 1.97 7 618.4
Upvotes: 1
Reputation: 863741
You can use read_csv
:
import pandas as pd
import numpy as np
from pandas.compat import StringIO
import datetime as dt
temp=u"""2016 01 01 00 00 19 348 2.05 7 618.4
2016 01 01 00 01 19 351 2.05 7 618.4
2016 01 01 00 02 18 0 2.05 7 618.4
2016 01 01 00 03 17 353 2.05 7 618.4
2016 01 01 00 04 19 346 2.02 7 618.4
2016 01 01 00 05 20 345 2.00 7 618.4
2016 01 01 00 06 22 348 1.97 7 618.4"""
#after testing replace StringIO(temp) to filename
parser = lambda date: pd.datetime.strptime(date, '%Y %m %d %H %M')
df = pd.read_csv(StringIO(temp),
sep="\s+", #separator whitespace
index_col=0, #convert first column to datetimeindex
date_parser=parser, #function for converting dates
parse_dates=[[0,1,2,3,4]], #columns to datetime
header=None) #none header
Then need set column names, because if use parameter names
get:
NotImplementedError: file structure not yet supported
df.columns = ['wind_speed(kts)', 'wind_direction(dec)', 'temperature(C)', 'relative_humidity(%)', 'pressure']
#remove index name
df.index.name = None
print (df)
wind_speed(kts) wind_direction(dec) temperature(C) \
2016-01-01 00:00:00 19 348 2.05
2016-01-01 00:01:00 19 351 2.05
2016-01-01 00:02:00 18 0 2.05
2016-01-01 00:03:00 17 353 2.05
2016-01-01 00:04:00 19 346 2.02
2016-01-01 00:05:00 20 345 2.00
2016-01-01 00:06:00 22 348 1.97
relative_humidity(%) pressure
2016-01-01 00:00:00 7 618.4
2016-01-01 00:01:00 7 618.4
2016-01-01 00:02:00 7 618.4
2016-01-01 00:03:00 7 618.4
2016-01-01 00:04:00 7 618.4
2016-01-01 00:05:00 7 618.4
2016-01-01 00:06:00 7 618.4
print (df.dtypes)
wind_speed(kts) int64
wind_direction(dec) int64
temperature(C) float64
relative_humidity(%) int64
pressure float64
dtype: object
print (df.index)
DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 00:01:00',
'2016-01-01 00:02:00', '2016-01-01 00:03:00',
'2016-01-01 00:04:00', '2016-01-01 00:05:00',
'2016-01-01 00:06:00'],
dtype='datetime64[ns]', freq=None)
Upvotes: 3