Reputation: 39
I have a pandas data frame comming from a data base request that is somehow like that:
index | id1 | id2 | param | val1 | val2 |
---|---|---|---|---|---|
0 | 1 | 10 | p1 | 100 | A |
1 | 1 | 10 | p2 | 200 | B |
2 | 2 | 20 | p1 | 300 | C |
3 | 2 | 20 | p2 | 400 | D |
df = pd.DataFrame({'id1': ['i1', 'i1', 'i2', 'i2'],
'id2': ['i10', 'i10', 'i20', 'i20'],
'param': ['p1', 'p2', 'p1', 'p2'],
'val1': [100, 200, 300, 400],
'val2': ['A', 'B', 'C', 'D']})
id1 and id2 are different indexes, val1 and val2 are possible values, param are different parameters. The combination of id1, id2, param is unique per table. val1 values are numbers, val2 values are strings. For id1 and id2 I chose strings for this example, in reality it could also be date/times.
I now want to have it in that format:
index | id1 | id2 | p1/val1 | p1/val2 | p2/val1 | p2/val2 |
---|---|---|---|---|---|---|
0 | 1 | 10 | 100 | A | 200 | B |
1 | 2 | 20 | 300 | C | 400 | D |
In reality the table might have missing entries -> is not filled fully.
My problem is now:
Any idea how I can reshape my table without using boring loops? The tabels can be very long in reality with hundrets of parameters and thousands ore even millions of lines.
Thanks a lot in advance.
Upvotes: 1
Views: 1029
Reputation: 862791
Hmmm, pivot
here working in last pandas versions:
df1 = df.pivot(['id1','id2'], 'param')
print (df1)
val1 val2
param p1 p2 p1 p2
id1 id2
i1 i10 100 200 A B
i2 i20 300 400 C D
If need also processing MultiIndex in columns
:
df1 = df.pivot(['id1','id2'], 'param').sort_index(axis=1, level=1)
df1.columns = df1.columns.map(lambda x: f'{x[1]}/{x[0]}')
df1 = df1.reset_index()
print (df1)
id1 id2 p1/val1 p1/val2 p2/val1 p2/val2
0 i1 i10 100 A 200 B
1 i2 i20 300 C 400 D
If use oldier pandas versions:
df1 = df.set_index(['id1','id2', 'param']).unstack().sort_index(axis=1, level=1)
df1.columns = df1.columns.map(lambda x: f'{x[1]}/{x[0]}')
df1 = df1.reset_index()
print (df1)
id1 id2 p1/val1 p1/val2 p2/val1 p2/val2
0 i1 i10 100 A 200 B
1 i2 i20 300 C 400 D
Upvotes: 0
Reputation: 1413
you can do:
df = pd.DataFrame({'id1': ['1', '1', '2', '2'],
'id2': ['10', '10', '20', '20'],
'param': ['p1', 'p2', 'p1', 'p2'],
'val1': [100, 200, 300, 400],
'val2': ['A', 'B', 'C', 'D']})
df_1=df.pivot_table(index=['id1','id2'],columns='param',aggfunc='first')
df_1.columns=['/'.join(i) for i in df_1.columns]
df_1=df_1.reset_index()
df_1
Out[86]:
id1 id2 val1/p1 val1/p2 val2/p1 val2/p2
0 1 10 100 200 A B
1 2 20 300 400 C D
You can later reorder the columns to your liking.
Upvotes: 0