Data4711
Data4711

Reputation: 39

pivot pandas data frame with more then one index / value and strings as values

I have a pandas data frame comming from a data base request that is somehow like that:

index id1 id2 param val1 val2
0 1 10 p1 100 A
1 1 10 p2 200 B
2 2 20 p1 300 C
3 2 20 p2 400 D
df = pd.DataFrame({'id1': ['i1', 'i1', 'i2', 'i2'],
                'id2': ['i10', 'i10', 'i20', 'i20'],
                'param': ['p1', 'p2', 'p1', 'p2'],
                'val1': [100, 200, 300, 400],
                'val2': ['A', 'B', 'C', 'D']})

id1 and id2 are different indexes, val1 and val2 are possible values, param are different parameters. The combination of id1, id2, param is unique per table. val1 values are numbers, val2 values are strings. For id1 and id2 I chose strings for this example, in reality it could also be date/times.

I now want to have it in that format:

index id1 id2 p1/val1 p1/val2 p2/val1 p2/val2
0 1 10 100 A 200 B
1 2 20 300 C 400 D

In reality the table might have missing entries -> is not filled fully.

My problem is now:

Any idea how I can reshape my table without using boring loops? The tabels can be very long in reality with hundrets of parameters and thousands ore even millions of lines.

Thanks a lot in advance.

Upvotes: 1

Views: 1029

Answers (2)

jezrael
jezrael

Reputation: 862791

Hmmm, pivot here working in last pandas versions:

df1 = df.pivot(['id1','id2'], 'param')
print (df1)
        val1      val2   
param     p1   p2   p1 p2
id1 id2                  
i1  i10  100  200    A  B
i2  i20  300  400    C  D

If need also processing MultiIndex in columns:

df1 = df.pivot(['id1','id2'], 'param').sort_index(axis=1, level=1)

df1.columns = df1.columns.map(lambda x: f'{x[1]}/{x[0]}')
df1 = df1.reset_index()
print (df1)
  id1  id2  p1/val1 p1/val2  p2/val1 p2/val2
0  i1  i10      100       A      200       B
1  i2  i20      300       C      400       D

If use oldier pandas versions:

df1 = df.set_index(['id1','id2', 'param']).unstack().sort_index(axis=1, level=1)

df1.columns = df1.columns.map(lambda x: f'{x[1]}/{x[0]}')
df1 = df1.reset_index()
print (df1)

  id1  id2  p1/val1 p1/val2  p2/val1 p2/val2
0  i1  i10      100       A      200       B
1  i2  i20      300       C      400       D

Upvotes: 0

Suhas Mucherla
Suhas Mucherla

Reputation: 1413

you can do:

df = pd.DataFrame({'id1': ['1', '1', '2', '2'],
                'id2': ['10', '10', '20', '20'],
                'param': ['p1', 'p2', 'p1', 'p2'],
                'val1': [100, 200, 300, 400],
                'val2': ['A', 'B', 'C', 'D']})

df_1=df.pivot_table(index=['id1','id2'],columns='param',aggfunc='first')
df_1.columns=['/'.join(i) for i in df_1.columns]
df_1=df_1.reset_index()

df_1
Out[86]: 
  id1 id2  val1/p1  val1/p2 val2/p1 val2/p2
0   1  10      100      200       A       B
1   2  20      300      400       C       D

You can later reorder the columns to your liking.

Upvotes: 0

Related Questions