Reputation: 875
Let's say if I have a Pandas df called df_1
like this:
id | date_created | rank_1 | rank_2 | rank_3 | rank_dict |
---|---|---|---|---|---|
2223 | 3/3/21 3:26 | www.google.com | www.yahoo.com | www.ford.com | {www.google.com:3, www.yahoo.com:2, www.ford.com:1} |
1112 | 2/25/21 1:35 | www.autoblog.com | www.motor1.com | www.webull.com | {www.autoblog.com:3, www.motor1.com:2, www.webull.com:1} |
and another df called df_2
that looks like this:
id | date_created | rank_1 | rank_2 | rank_3 |
---|---|---|---|---|
2223 | 4/9/21 5:15 | www.yahoo.com | www.whatever.com | www.google.com |
1112 | 8/20/21 2:30 | www.gm.com | www.motor1.com | www.webull.com |
I want to create a new column called new_rank_dict
in df_2
using URLs in rank_1, rank_2, rank_3 in df_2
as Keys, and Values created using the following criteria:
df_1
that has matching id
, if the rank_1 URL exists in the Keys of rank_dict
from df_1
for that row, assign the same Value as it was seen from that dictionary. If the rank_1
URL doesn't exist in that dictionary, assign a Value of 0 to it.rank_2
and rank_3
, and finally will end up with a dictionary for each row in df_2
.For example, since row 1 in df_1
and df_2
share the same id
(2223), and rank_1
(www.yahoo.com) in df_2
is a Key in rank_dict
in df_1
, and that Key has value of 2, then assign Value of 2 to the www.yahoo.com Key. rank_2
(www.whatever.com) doesn't exist in rank_dict
in df_1
, so it gets a Value of 0. rank_3
(www.google.com) does exist in rank_dict
in df_1
and its Value is 3, so assign the Value 3 to that Key for the new dictionary. At the end, row 1 in df_2
will have the new_rank_dict
: {www.yahoo.com:2, www.whatever.com:0, www.google.com:3}
So the ideal result df_2
should look like this:
id | date_created | rank_1 | rank_2 | rank_3 | rank_dict |
---|---|---|---|---|---|
2223 | 4/9/21 5:15 | www.yahoo.com | www.whatever.com | www.google.com | {www.yahoo.com:2, www.whatever.com:0, www.google.com:3} |
1112 | 8/20/21 2:30 | www.gm.com | www.motor1.com | www.webull.com | {www.gm.com:0, www.motor1.com:2, www.webull.com:1} |
I have been struggling to find a Pythonic way to achieve this goal efficiently - have searched on the web and most tutorials point to create a single dictionary from Pandas column, rather than a column of dictionary which is what I need here.
Any suggestion would be greatly appreciated!
Upvotes: 1
Views: 768
Reputation: 71707
dcts = df2['id'].map(df1.set_index('id')['rank_dict'])
cols = df2.filter(like='rank')
df2['rank_dict'] = [{u: dct.get(u, 0) for u in urls}
for urls, dct in zip(cols.values, dcts)]
map
the column rank_dict
from df1
to df2
based on the common id
>>> dcts
0 {'www.google.com': 3, 'www.yahoo.com': 2, 'www.ford.com': 1}
1 {'www.autoblog.com': 3, 'www.motor1.com': 2, 'www.webull.com': 1}
Name: id, dtype: object
Now filter the rank
like columns from df2
:
>>> cols
rank_1 rank_2 rank_3
0 www.yahoo.com www.whatever.com www.google.com
1 www.gm.com www.motor1.com www.webull.com
zip
the cols
and dcts
, then iterate over this zip iterator inside a list comprehension to create a required dictionary that satisfies the given criteria.
>>> df2
id date_created rank_1 rank_2 rank_3 rank_dict
0 2223 4/9/21 5:15 www.yahoo.com www.whatever.com www.google.com {'www.yahoo.com': 2, 'www.whatever.com': 0, 'www.google.com': 3}
1 1112 8/20/21 2:30 www.gm.com www.motor1.com www.webull.com {'www.gm.com': 0, 'www.motor1.com': 2, 'www.webull.com': 1}
Upvotes: 3