Reputation: 1986

Reshape a pandas DataFrame to a Series Element

I have problem with a pandas DataFrame that has the date as index value and the time as columns value, e.g.

states = pd.DataFrame([[1, 2], [4, 5]], columns=(
'00:00', '01:00'), index = ['2007-01-01', '2007-01-02'])

The result should look like this:

            00:00  01:00
2007-01-01      1      2
2007-01-02      4      5

I'd now like to convert this to a Series looking like:

2007-01-01 00:00 1
2007-01-01 01:00 2
2007-01-02 00:00 4
2007-01-02 01:00 5

with the whole timestamp as index. I tried to concat the Series

blib = states.iloc[0]
blub = states.iloc[1]
pd.concat([blib, blub])

but I then lose the date information.

Upvotes: 1

Answers (3)

bastewart

Reputation: 96

You can achieve this by a performing the concat you started with and extending it to the other axis. Creating two DataFrames from these series allows you to merge them back together and create the index you want:

blib = states.iloc[0]
blub = states.iloc[1]

times_df = pd.DataFrame(pd.concat([blib, blub]), columns=['value'])
times_df.reset_index(inplace=True)
# Renaming index so that there aren't two 'index' columns after merge
times_df.rename(columns={'index': 'time'}, inplace=True)

a = states['00:00']
b = states['01:00']

dates_df = pd.DataFrame(pd.concat([a, b]), columns=['value'])
dates_df.reset_index(inplace=True)
dates_df.rename(columns={'index': 'date'}, inplace=True)

# Merging gives the four rows you want but still need to create index
merged = pd.merge(times_df, dates_df, on='value')

merged['datetime'] = pd.to_datetime(merged['date'] + ' ' + merged['time'])
merged.drop(['date', 'time'], axis=1, inplace=True)

merged.set_index('datetime', drop=True, inplace=True)

result = merged['value']

The output of result:

datetime
2007-01-01 00:00:00    1
2007-01-02 00:00:00    4
2007-01-01 01:00:00    2
2007-01-02 01:00:00    5
Name: value, dtype: int64

If you would prefer the index as a string change the merged['datetime'] = ... line to this instead:

merged['datetime'] = merged['date'] + ' ' + merged['time']

Result will then be exactly as you asked:

datetime
2007-01-01 00:00    1
2007-01-01 01:00    2
2007-01-02 00:00    4
2007-01-02 01:00    5
Name: value, dtype: int64

EDIT: I realise this isn't very efficient though, the multi-index method is probably more promising.

Upvotes: 0

MichaelA

Reputation: 1986

The suggestions of farhawa helps a lot to unify the MultiIndex into a single Index (which makes sense in my case as the index is a datetime and the indices weren't really multidimensional. Most time the MultiIndex is probably the better solution).

We use the suggestion from farhawa

pp = states.stack()

and get a MultiIndex

pp.index
Out[18]: 
MultiIndex(levels=[[u'2007-01-01', u'2007-01-02'], [u'00:00', u'01:00']],
       labels=[[0, 0, 1, 1], [0, 1, 0, 1]])

Combining these to a new index and reassigning solves the problem:

newIndex = [pp.index.levels[0][l0] + ' ' + pp.index.levels[1][l1]
                for l0, l1 in zip(pp.index.labels[0], blub.index.labels[1])]
newIndex = pd.DatetimeIndex(newIndex) 
pp = pd.Series(pp.values,index = newIndex)

May not be the most efficient way by recreating the Series, therefore I'd by happy for any better suggestions.

Upvotes: 0

farhawa

Reputation: 10407

You can use stack():

In [1]: states.stack()
Out[1]:
2007-01-01  00:00    1
            01:00    2
2007-01-02  00:00    4
            01:00    5
dtype: int64

In [2]: type(states.stack())
Out[2]: pandas.core.series.Series

Upvotes: 1

Reshape a pandas DataFrame to a Series Element

Answers (3)

Related Questions