Reputation: 1986
I have problem with a pandas DataFrame that has the date as index value and the time as columns value, e.g.
states = pd.DataFrame([[1, 2], [4, 5]], columns=(
'00:00', '01:00'), index = ['2007-01-01', '2007-01-02'])
The result should look like this:
00:00 01:00
2007-01-01 1 2
2007-01-02 4 5
I'd now like to convert this to a Series looking like:
2007-01-01 00:00 1
2007-01-01 01:00 2
2007-01-02 00:00 4
2007-01-02 01:00 5
with the whole timestamp as index. I tried to concat the Series
blib = states.iloc[0]
blub = states.iloc[1]
pd.concat([blib, blub])
but I then lose the date information.
Upvotes: 1
Views: 330
Reputation: 96
You can achieve this by a performing the concat you started with and extending it to the other axis. Creating two DataFrames from these series allows you to merge them back together and create the index you want:
blib = states.iloc[0]
blub = states.iloc[1]
times_df = pd.DataFrame(pd.concat([blib, blub]), columns=['value'])
times_df.reset_index(inplace=True)
# Renaming index so that there aren't two 'index' columns after merge
times_df.rename(columns={'index': 'time'}, inplace=True)
a = states['00:00']
b = states['01:00']
dates_df = pd.DataFrame(pd.concat([a, b]), columns=['value'])
dates_df.reset_index(inplace=True)
dates_df.rename(columns={'index': 'date'}, inplace=True)
# Merging gives the four rows you want but still need to create index
merged = pd.merge(times_df, dates_df, on='value')
merged['datetime'] = pd.to_datetime(merged['date'] + ' ' + merged['time'])
merged.drop(['date', 'time'], axis=1, inplace=True)
merged.set_index('datetime', drop=True, inplace=True)
result = merged['value']
The output of result:
datetime
2007-01-01 00:00:00 1
2007-01-02 00:00:00 4
2007-01-01 01:00:00 2
2007-01-02 01:00:00 5
Name: value, dtype: int64
If you would prefer the index as a string change the merged['datetime'] = ...
line to this instead:
merged['datetime'] = merged['date'] + ' ' + merged['time']
Result will then be exactly as you asked:
datetime
2007-01-01 00:00 1
2007-01-01 01:00 2
2007-01-02 00:00 4
2007-01-02 01:00 5
Name: value, dtype: int64
EDIT: I realise this isn't very efficient though, the multi-index method is probably more promising.
Upvotes: 0
Reputation: 1986
The suggestions of farhawa helps a lot to unify the MultiIndex into a single Index (which makes sense in my case as the index is a datetime and the indices weren't really multidimensional. Most time the MultiIndex is probably the better solution).
We use the suggestion from farhawa
pp = states.stack()
and get a MultiIndex
pp.index
Out[18]:
MultiIndex(levels=[[u'2007-01-01', u'2007-01-02'], [u'00:00', u'01:00']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]])
Combining these to a new index and reassigning solves the problem:
newIndex = [pp.index.levels[0][l0] + ' ' + pp.index.levels[1][l1]
for l0, l1 in zip(pp.index.labels[0], blub.index.labels[1])]
newIndex = pd.DatetimeIndex(newIndex)
pp = pd.Series(pp.values,index = newIndex)
May not be the most efficient way by recreating the Series, therefore I'd by happy for any better suggestions.
Upvotes: 0
Reputation: 10407
You can use stack()
:
In [1]: states.stack()
Out[1]:
2007-01-01 00:00 1
01:00 2
2007-01-02 00:00 4
01:00 5
dtype: int64
In [2]: type(states.stack())
Out[2]: pandas.core.series.Series
Upvotes: 1