Reputation: 1247
I have 2 dataframes, one containing some unique values, and another one containing unique groups of this unique values, with group ids, where each value appears in one group and one group only.
df1: df2:
groups ids
0 A 0 (A, D, F) 1
1 B 1 (C, E) 2
2 C 2 (B, K, L) 3
3 D .
. .
.
.
Is there an efficient way to map values from the first dataframe with ids from the second? I got the result using 2 for loops but it is very slow, and tried using 'np.where(df1 in df2["groups"])', but got an array of None.
Desired output:
df3:
id
0 A 1
1 B 3
2 C 2
3 D 1
.
.
.
Upvotes: 2
Views: 404
Reputation: 863166
If in column groups
are tuples use nested dictionary comprehension with Series.map
:
d = {x: b for a, b in zip(df2['groups'], df2['ids']) for x in a}
print (d)
{'A': 1, 'D': 1, 'F': 1, 'C': 2, 'E': 2, 'B': 3, 'K': 3, 'L': 3}
df1 = pd.DataFrame({'col':list('ABCD')})
df1['ids'] = df1['col'].map(d)
print (df1)
col ids
0 A 1
1 B 3
2 C 2
3 D 1
If data are Series:
s = pd.Series(list('ABCD'))
df3 = s.to_frame('col')
df3['ids'] = df3['col'].map(d)
print (df3)
col ids
0 A 1
1 B 3
2 C 2
3 D 1
Upvotes: 0
Reputation: 3770
using pd.unstack
and pd.merge
ids = df2.pop('ids')
temp_df2 = pd.DataFrame(df2['groups'].tolist(),ids).unstack().reset_index(name=1)
print(df.merge(temp_df2, on=1).drop(['level_0'],1))
1 ids
0 A 1
1 B 3
2 C 2
3 D 1
Upvotes: 1