Reputation: 1357

How to merge two tables and transpose rows to columns

I have these two tables:

id  x       y
8   42      1.9
9   30      1.9

The goal is to get the new table T3:

id  x       y       s1      s2      s3
8   42      1.9     55      56      58
9   30      1.9     57      58      60

If I do this operation, then it only performs merging without transposing:

pd.merge(T1, T2, on=['id'])

How to create columns s1, s2 and s3, each corresponding to a row (the number of rows per id is always fixed and equal to 3)?

Upvotes: 2

Answers (3)

MaxU - stand with Ukraine

Reputation: 210852

UPDATE:

as @Jeff has written in his comment @ubuntu's solution should be faster and more idiomatic compared to mine:

In [40]: T1.merge(
   ....:     T2.pivot_table(index='id',
   ....:                    values='signal',
   ....:                    columns='s' + T2.groupby(['id'])['signal'].cumcount().astype(str))
   ....:       .reset_index()
   ....: )
Out[40]:
   id   x    y  s0  s1  s2
0   8  42  1.9  55  56  59
1   9  30  1.9  57  58  60

OLD answer:

you can do it this way:

In [209]: %paste
(t1.set_index('id')
   .join(t2.groupby('id')['signal']
           .apply(lambda x: x.tolist())
           .apply(pd.Series))
   .reset_index()
)
## -- End pasted text --
Out[209]:
   id   x    y   0   1   2
0   8  42  1.9  55  56  59
1   9  30  1.9  57  58  60

Explanation:

group T2 by id and "collect" all corresponding signals into lists

In [211]: t2.groupby('id')['signal'].apply(lambda x: x.tolist())
Out[211]:
id
8    [55, 56, 59]
9    [57, 58, 60]
Name: signal, dtype: object

expand lists to columns

In [213]: t2.groupby('id')['signal'].apply(lambda x: x.tolist()).apply(pd.Series)
Out[213]:
     0   1   2
id
8   55  56  59
9   57  58  60

and finally join both tables by index id

PS if you want to rename all numeric columns, you can do it this way (assuming that you saved result into the rslt DF):

In [224]: rslt.columns = [c if c in ['id','x','y'] else 's{}'.format(c) for c in rslt.columns.tolist()]

In [225]: rslt
Out[225]:
   id   x    y  s0  s1  s2
0   8  42  1.9  55  56  59
1   9  30  1.9  57  58  60

Upvotes: 4

unutbu

Reputation: 879701

Yet another way is to use groupby/cumcount/pivot:

import pandas as pd
T1 = pd.DataFrame({'id': [8, 9], 'x': [42, 30], 'y': [1.9, 1.9]})
T2 = pd.DataFrame({'id': [8, 8, 8, 9, 9, 9], 'signal': [55, 56, 59, 57, 58, 60]})
T2['col'] = 's' + T2.groupby(['id'])['signal'].cumcount().astype(str)
T2 = T2.pivot(index='id', columns='col', values='signal').reset_index()
result = pd.merge(T1, T2)
print(result)

yields

   id   x    y  s0  s1  s2
0   8  42  1.9  55  56  59
1   9  30  1.9  57  58  60

The main trick is to use groupby/cumcount to add a per-group cumulative count to T2:

In [81]: T2['col'] = 's' + T2.groupby(['id'])['signal'].cumcount().astype(str); T2
Out[81]: 
   id  signal col
0   8      55  s0
1   8      56  s1
2   8      59  s2
3   9      57  s0
4   9      58  s1
5   9      60  s2

Then pivot can be used to reshape T2 into (or at least close to) the desired form:

In [82]: T2 = T2.pivot(index='id', columns='col', values='signal').reset_index(); T2
Out[82]: 
col  id  s0  s1  s2
0     8  55  56  59
1     9  57  58  60

and result can be obtained by merging:

In [83]: pd.merge(T1, T2)
Out[83]: 
   id   x    y  s0  s1  s2
0   8  42  1.9  55  56  59
1   9  30  1.9  57  58  60

Upvotes: 3

MaThMaX

Reputation: 2015

Here is my way of using groupby and unstack:

df = df1.merge(df2.groupby('id')['signal'].apply(lambda x: x.reset_index(drop=True)).unstack().reset_index())

df
Out[63]: 
   id   x    y   0   1   2
0   8  42  1.9  55  56  59
1   9  30  1.9  57  58  60

If I separate them:

df2t = df2.groupby('id')['signal'].apply(lambda x: x.reset_index(drop=True)).unstack().reset_index()

df2t
Out[59]: 
   id   0   1   2
0   8  55  56  59
1   9  57  58  60

df = df1.merge(df2t)

df
Out[61]: 
   id   x    y   0   1   2
0   8  42  1.9  55  56  59
1   9  30  1.9  57  58  60

Upvotes: 2

How to merge two tables and transpose rows to columns

Answers (3)

Related Questions