Reputation: 742
I practiced the MulitIndex function in pandas, but, it does not work as I expected. I think it is because my fundamental knowledge is not enough.
from StringIO import StringIO # io.StringIO on python 3.X
import pandas as pd
datacsv = StringIO("""\
date,id,a,b
20150209,42366,7644,6366
20150209,52219,2741,1796
20150209,52831,163,145
20150209,53209,1047,862
20150209,53773,31343,22501
20150209,58935,16621,14873
20150209,65464,19838,12177
20150209,65823,4903,2982
20150209,68497,16564,12207
20150209,79230,48714,37355
20150208,42366,7644,6366
20150208,52219,2741,1796
20150208,52831,163,145
20150208,53209,1047,862
20150208,53773,31343,22501
20150208,58935,16621,14873
20150208,65464,19838,12177
20150208,65823,4903,2982
20150208,68497,16564,12207
20150208,79230,48714,37355"
""")
df = pd.read_csv(datacsv)
df = df.set_index(['date','id']
The current 'date' is note datetime. How to transfer the type of 'date' into datetime such as 2015-02-09?
Upvotes: 2
Views: 101
Reputation: 394091
Why perform the datetime conversion after loading when you can just pass the column name to read_csv
for the param parse_dates
:
In [30]:
df = pd.read_csv(io.StringIO(temp1), parse_dates=['date'])
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 20 entries, 0 to 19
Data columns (total 4 columns):
date 20 non-null datetime64[ns]
id 20 non-null int64
a 20 non-null int64
b 20 non-null int64
dtypes: datetime64[ns](1), int64(3)
Additionally you can specify which column should be treated as the index so you can perform the datetime conversion and set the index as params to read_csv
, just set parse_dates
and index_col
:
In [34]:
df = pd.read_csv(io.StringIO(temp1), parse_dates=['date'], index_col=['date'])
type(df.index)
Out[34]:
pandas.tseries.index.DatetimeIndex
Upvotes: 0
Reputation: 176850
You can convert a Series (or column) to datetime using pd.to_datetime
and specifying the format.
For instance a Series of integers like the dates in your CSV file can be converted like this:
>>> s = pd.Series([20150207, 20150208, 20150209])
>>> pd.to_datetime(s, format="%Y%m%d")
0 2015-02-07
1 2015-02-08
2 2015-02-09
dtype: datetime64[ns]
So to change the date column before you set the index, you could write:
df['date'] = pd.to_datetime(df['date'], format="%Y%m%d")
Upvotes: 3