Dance Party2
Dance Party2

Reputation: 7546

Pandas Dynamic Stack

Given the following data frame:

import numpy as np
import pandas as pd

df = pd.DataFrame({'foo':['a','b','c','d'],
                   'bar':['e','f','g','h'],
                       0:['i','j','k',np.nan],
                       1:['m',np.nan,'o','p']})
df=df[['foo','bar',0,1]]
df

    foo   bar    0      1
0   a     e      i      m
1   b     f      j     NaN
2   c     g      k      o
3   d     h     NaN     p

...which resulted from a previous procedure that produced columns 0 and 1 (and may have produced more or fewer columns than 0 and 1 depending on the data): I want to somehow stack (if that's the correct term) the data so that each value of 0 and 1 (ignoring NaNs) produces a new row like this:

    foo bar
0   a   e
0   a   i
0   a   m
1   b   f
1   b   j
2   c   g
2   c   k
2   c   o
3   d   h
3   d   p

You probably noticed that the common field is foo. It will likely occur that there are more common fields in my actual data set. Also, I'm not sure how important it is that the index values repeat in the end result across values of foo. As long as the data is correct, that's my main concern.

Update: What if I have 2+ common fields like this:

import numpy as np
import pandas as pd

df = pd.DataFrame({'foo':['a','a','b','b'],
                   'foo2':['a2','b2','c2','d2'],
                   'bar':['e','f','g','h'],
                       0:['i','j','k',np.nan],
                       1:['m',np.nan,'o','p']})
df=df[['foo','foo2','bar',0,1]]
df

    foo     foo2    bar     0   1
0   a       a2      e     i     m
1   a       b2      f     j     NaN
2   b       c2      g     k     o
3   b       d2      h     NaN   p

Upvotes: 3

Views: 56

Answers (1)

jezrael
jezrael

Reputation: 863791

You can use set_index, stack and reset_index:

print df.set_index('foo').stack().reset_index(level=1, drop=True).reset_index(name='bar')
  foo bar
0   a   e
1   a   i
2   a   m
3   b   f
4   b   j
5   c   g
6   c   k
7   c   o
8   d   h
9   d   p

If you need index, use melt:

print pd.melt(df.reset_index(), 
              id_vars=['index', 'foo'], 
              value_vars=['bar', 0, 1],
              value_name='bar')
        .sort_values('index')
        .set_index('index', drop=True)
        .dropna()
        .drop('variable', axis=1)
        .rename_axis(None)

  foo bar
0   a   e
0   a   i
0   a   m
1   b   f
1   b   j
2   c   g
2   c   k
2   c   o
3   d   h
3   d   p

Or use not well known lreshape:

print pd.lreshape(df.reset_index(), {'bar': ['bar', 0, 1]})
        .sort_values('index')
        .set_index('index', drop=True)
        .rename_axis(None)

  foo bar
0   a   e
0   a   i
0   a   m
1   b   f
1   b   j
2   c   g
2   c   k
2   c   o
3   d   h
3   d   p

Upvotes: 2

Related Questions