user8766186
user8766186

Reputation: 81

Pandas: mapping many ids to unique id

I have two data sets:

One that has ids that can change (df1):

|many_id|data1|data2|
 -------------------
|abc    |value|value|
|efg    |value|value|

One that has the unique identifier mapper (df2):

|unique_id|[many_id]      |
 -------------------------
|123      |[hij, abc]     |
|234      |[klm, nop, qrs]|
|345      |[efg]          |

I want to be able to map many_id to unique_id:

|many_id|data1|data2|unique_id|
 -----------------------------
|abc    |value|value|123      |
|efg    |value|value|345      |

In the quickest process possible for example, if it were possible merge on many_id from df1 to [many_id] array from df2.

The method I used was to break many_id down into rows:

|unique_id|many_id|
|123      |hij    |
|123      |abc    |
|234      |klm    |
|234      |nop    |
|234      |qrs    |
|345      |efg    |

And then did a merge from there based on many_id but not sure if that was the most effective way to do so given that I made my dataframe quite a bit larger.

Thanks in advance!

Upvotes: 0

Views: 53

Answers (2)

BENY
BENY

Reputation: 323266

IIUC, flatten your df2 then create the map dataframe

df1.many_id.map(dict(zip(mapdf['many_id'],mapdf['unique_id'])))
Out[158]: 
0    123
1    345
Name: many_id, dtype: int64
#df1['unique_id'] = df1.many_id.map(dict(zip(mapdf['many_id'],mapdf['unique_id'])))

Update you can using this to get what you mentioned

newdf=pd.DataFrame({'unique_id':df2['unique_id'].repeat(df2.many_id.str.len()),'many_id':np.concatenate(df2.many_id.values)})
newdf
Out[174]: 
  many_id  unique_id
0     hij        123
0     abc        123
1     klm        234
1     nop        234
1     qrs        234
2     efg        345

Upvotes: 1

sundance
sundance

Reputation: 2945

Transform your df2 so that it is a table with each many_id on its own row:

d = df2.set_index("unique_id")["many_id"].apply(pd.Series)
many_ids = d.stack().dropna().to_frame("many_id").reset_index()
df1.join(many_ids.set_index("many_id")["unique_id"], on="many_id")

Result:

  many_id  data1  data2  unique_id
0     abc  value  value        123
1     efg  value  value        345

Upvotes: 1

Related Questions