Reputation: 81
I have two data sets:
One that has ids that can change (df1):
|many_id|data1|data2|
-------------------
|abc |value|value|
|efg |value|value|
One that has the unique identifier mapper (df2):
|unique_id|[many_id] |
-------------------------
|123 |[hij, abc] |
|234 |[klm, nop, qrs]|
|345 |[efg] |
I want to be able to map many_id to unique_id:
|many_id|data1|data2|unique_id|
-----------------------------
|abc |value|value|123 |
|efg |value|value|345 |
In the quickest process possible for example, if it were possible merge on many_id from df1 to [many_id] array from df2.
The method I used was to break many_id down into rows:
|unique_id|many_id|
|123 |hij |
|123 |abc |
|234 |klm |
|234 |nop |
|234 |qrs |
|345 |efg |
And then did a merge from there based on many_id but not sure if that was the most effective way to do so given that I made my dataframe quite a bit larger.
Thanks in advance!
Upvotes: 0
Views: 53
Reputation: 323266
IIUC, flatten your df2 then create the map
dataframe
df1.many_id.map(dict(zip(mapdf['many_id'],mapdf['unique_id'])))
Out[158]:
0 123
1 345
Name: many_id, dtype: int64
#df1['unique_id'] = df1.many_id.map(dict(zip(mapdf['many_id'],mapdf['unique_id'])))
Update you can using this to get what you mentioned
newdf=pd.DataFrame({'unique_id':df2['unique_id'].repeat(df2.many_id.str.len()),'many_id':np.concatenate(df2.many_id.values)})
newdf
Out[174]:
many_id unique_id
0 hij 123
0 abc 123
1 klm 234
1 nop 234
1 qrs 234
2 efg 345
Upvotes: 1
Reputation: 2945
Transform your df2
so that it is a table with each many_id
on its own row:
d = df2.set_index("unique_id")["many_id"].apply(pd.Series)
many_ids = d.stack().dropna().to_frame("many_id").reset_index()
df1.join(many_ids.set_index("many_id")["unique_id"], on="many_id")
Result:
many_id data1 data2 unique_id
0 abc value value 123
1 efg value value 345
Upvotes: 1