Reputation: 21961
1988 1 1 7.88 15.57 25.00 0.00 0.81 4.02
1988 1 2 6.50 10.37 24.87 0.00 0.49 4.30
1988 1 3 6.48 -8.79 21.28 0.00 0.62 3.91
I read the above file as:
df = pd.read_fwf('27_35.txt', header=None, widths=[6,4,4,6,6,6,6,6,6])
The 1st column is year, 2nd column is month and 3rd column is day. How do I create an index out of these 3 columns ('1988-1-1')?
Upvotes: 2
Views: 2859
Reputation: 394041
use parse_dates
param and pass a list in a list (so it understands that you want to treat the list of columns as a single entity for parsing):
In [83]:
t="""1988 1 1 7.88 15.57 25.00 0.00 0.81 4.02
1988 1 2 6.50 10.37 24.87 0.00 0.49 4.30
1988 1 3 6.48 -8.79 21.28 0.00 0.62 3.91"""
df = pd.read_fwf(io.StringIO(t), header=None, widths=[6,4,4,6,6,6,6,6,6], parse_dates=[[0,1,2]])
df
Out[83]:
0_1_2 3 4 5 6 7 8
0 1988-01-01 7.88 1 5.57 2 5.00 0 0.81 4.02
1 1988-01-02 6.50 10.37 24.87 0 0.49 4.30
2 1988-01-03 6.48 -8.79 21.28 0 0.62 3.91
In [84]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2
Data columns (total 7 columns):
0_1_2 3 non-null datetime64[ns]
3 3 non-null object
4 3 non-null object
5 3 non-null float64
6 3 non-null float64
7 3 non-null float64
8 3 non-null float64
dtypes: datetime64[ns](1), float64(4), object(2)
memory usage: 192.0+ bytes
EDIT
To keep the original date columns add param keep_date_col=True
df = pd.read_fwf('27_35.txt', header=None, widths=[6,4,4,6,6,6,6,6,6], parse_dates=[[0,1,2]], keep_date_col=True)
Upvotes: 5
Reputation: 3928
I would do this:
import pandas as pd
df = pd.DataFrame(data={'Year':[1988, 1988, 1988], 'Month':[1, 1, 1],
'Day': [1, 2, 3]})
date = [str(y)+'-'+str(m)+'-'+str(d) for y, m, d in zip(df.Year, df.Month, df.Day)]
df.index = pd.to_datetime(date)
>>> print df
Day Month Year
1988-01-01 1 1 1988
1988-01-02 2 1 1988
1988-01-03 3 1 1988
After that you can drop Day, Month, and Year from your dataframe.
Upvotes: 1
Reputation: 36545
I'm always fogetting how to deal with dates in Pandas .... You could try this:
df.index = df.apply(lambda row: datetime.datetime(int(row[0]), int(row[1]), int(row[2])),1)
Upvotes: 1