Reputation: 1357
I have these two tables:
T1
id x y
8 42 1.9
9 30 1.9
T2
id signal
8 55
8 56
8 59
9 57
9 58
9 60
The goal is to get the new table T3:
id x y s1 s2 s3
8 42 1.9 55 56 58
9 30 1.9 57 58 60
If I do this operation, then it only performs merging without transposing:
pd.merge(T1, T2, on=['id'])
How to create columns s1
, s2
and s3
, each corresponding to a row (the number of rows per id
is always fixed and equal to 3)?
Upvotes: 2
Views: 2389
Reputation: 210852
UPDATE:
as @Jeff has written in his comment @ubuntu's solution should be faster and more idiomatic compared to mine:
In [40]: T1.merge(
....: T2.pivot_table(index='id',
....: values='signal',
....: columns='s' + T2.groupby(['id'])['signal'].cumcount().astype(str))
....: .reset_index()
....: )
Out[40]:
id x y s0 s1 s2
0 8 42 1.9 55 56 59
1 9 30 1.9 57 58 60
OLD answer:
you can do it this way:
In [209]: %paste
(t1.set_index('id')
.join(t2.groupby('id')['signal']
.apply(lambda x: x.tolist())
.apply(pd.Series))
.reset_index()
)
## -- End pasted text --
Out[209]:
id x y 0 1 2
0 8 42 1.9 55 56 59
1 9 30 1.9 57 58 60
Explanation:
group T2
by id
and "collect" all corresponding signals into lists
In [211]: t2.groupby('id')['signal'].apply(lambda x: x.tolist())
Out[211]:
id
8 [55, 56, 59]
9 [57, 58, 60]
Name: signal, dtype: object
expand lists to columns
In [213]: t2.groupby('id')['signal'].apply(lambda x: x.tolist()).apply(pd.Series)
Out[213]:
0 1 2
id
8 55 56 59
9 57 58 60
and finally join both tables by index id
PS if you want to rename all numeric columns, you can do it this way (assuming that you saved result into the rslt
DF):
In [224]: rslt.columns = [c if c in ['id','x','y'] else 's{}'.format(c) for c in rslt.columns.tolist()]
In [225]: rslt
Out[225]:
id x y s0 s1 s2
0 8 42 1.9 55 56 59
1 9 30 1.9 57 58 60
Upvotes: 4
Reputation: 879701
Yet another way is to use groupby/cumcount/pivot:
import pandas as pd
T1 = pd.DataFrame({'id': [8, 9], 'x': [42, 30], 'y': [1.9, 1.9]})
T2 = pd.DataFrame({'id': [8, 8, 8, 9, 9, 9], 'signal': [55, 56, 59, 57, 58, 60]})
T2['col'] = 's' + T2.groupby(['id'])['signal'].cumcount().astype(str)
T2 = T2.pivot(index='id', columns='col', values='signal').reset_index()
result = pd.merge(T1, T2)
print(result)
yields
id x y s0 s1 s2
0 8 42 1.9 55 56 59
1 9 30 1.9 57 58 60
The main trick is to use groupby/cumcount
to add a per-group cumulative count to T2:
In [81]: T2['col'] = 's' + T2.groupby(['id'])['signal'].cumcount().astype(str); T2
Out[81]:
id signal col
0 8 55 s0
1 8 56 s1
2 8 59 s2
3 9 57 s0
4 9 58 s1
5 9 60 s2
Then pivot
can be used to reshape T2
into (or at least close to) the desired form:
In [82]: T2 = T2.pivot(index='id', columns='col', values='signal').reset_index(); T2
Out[82]:
col id s0 s1 s2
0 8 55 56 59
1 9 57 58 60
and result
can be obtained by merging:
In [83]: pd.merge(T1, T2)
Out[83]:
id x y s0 s1 s2
0 8 42 1.9 55 56 59
1 9 30 1.9 57 58 60
Upvotes: 3
Reputation: 2015
Here is my way of using groupby
and unstack
:
df = df1.merge(df2.groupby('id')['signal'].apply(lambda x: x.reset_index(drop=True)).unstack().reset_index())
df
Out[63]:
id x y 0 1 2
0 8 42 1.9 55 56 59
1 9 30 1.9 57 58 60
If I separate them:
df2t = df2.groupby('id')['signal'].apply(lambda x: x.reset_index(drop=True)).unstack().reset_index()
df2t
Out[59]:
id 0 1 2
0 8 55 56 59
1 9 57 58 60
df = df1.merge(df2t)
df
Out[61]:
id x y 0 1 2
0 8 42 1.9 55 56 59
1 9 30 1.9 57 58 60
Upvotes: 2