lost
lost

Reputation: 2391

Pandas pivot using selected values as index

I read this excellent guide to pivoting but I can't work out how to apply it to my case. I have tidy data like this:

>>> import pandas as pd
>>> df = pd.DataFrame({
...    'case': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', ],
...    'perf_var': ['num', 'time', 'num', 'time', 'num', 'time', 'num', 'time'],
...    'perf_value': [1, 10, 2, 20, 1, 30, 2, 40] 
...     }
...     )
>>>
>>> df
  case perf_var  perf_value
0    a      num           1
1    a     time          10
2    a      num           2
3    a     time          20
4    b      num           1
5    b     time          30
6    b      num           2
7    b     time          40

What I want is:

to give:

case a   b
1.0  10  30
2.0  20  40

All the pivot examples I can see have the index and values in separate columns, but the above seems like a valid/common "tidy" data case to me (I think?). Is it possible to pivot from this?

Upvotes: 4

Views: 161

Answers (1)

sammywemmy
sammywemmy

Reputation: 28644

You need a bit of preprocessing to get your final result :

   (df.assign(num=np.where(df.perf_var == "num",
                           df.perf_value, 
                           np.nan),
             time=np.where(df.perf_var == "time", 
                           df.perf_value, 
                           np.nan))
      .assign(num=lambda x: x.num.ffill(),
              time=lambda x: x.time.bfill())
      .loc[:, ["case", "num", "time"]]
      .drop_duplicates()
      .pivot("num", "case", "time"))


case       a    b
num     
1.0     10.0    30.0
2.0     20.0    40.0

An alternative route to the same end point :

(
    df.set_index(["case", "perf_var"], append=True)
    .unstack()
    .droplevel(0, 1)
    .assign(num=lambda x: x.num.ffill(), 
            time=lambda x: x.time.bfill())
    .drop_duplicates()
    .droplevel(0)
    .set_index("num", append=True)
    .unstack(0)
    .rename_axis(index=None)
)

Upvotes: 2

Related Questions