Reputation: 25
I have two dfs in python :
df1
folder_name name
0 f1 aa
1 g1 bb
df2
name icon
0 aa i1
1 bb i2
2 aadoq i3
3 bbaddd i4
Desired output:
df
folder_name name icon
0 f1 aa i1
1 g1 bb i2
2 f1 aadoq i3
3 g1 bbaddd i4
I tried merging them but it seemed wrong
pd.merge(df1,df2,on='name',how='right')
i am getting :
folder_name name icon
0 f1 aa i1
1 g1 bb i2
2 NAN aadoq i3
3 NAN bbaddd i4
So if the prefix string in df2 name column matches any name column itme in df1 i want that folder name to be there for that name column in output
Upvotes: 1
Views: 184
Reputation: 562
Based on your question, here are the prerequisites I got
Here is my Python code. I used the Python regex module. I selected the "name" list from both dataframes, checked if any name in dataframe 1 matches the name in dataframe 2 (match in python regex means to match from the beginning of the string). Created a new list called MappedName based on these criteria where there is a match use the matched name value from dataframe 1 if not, use the name value from dataframe 2. Added this list as a new column to dataframe 2. Used the 'name' column from dataframe 1 and 'MappedName' column from dataframe 2 for merge criteria.
I added one extra data point to the dataframe 2 to show what happens when there is no match with the regular expressions.
from pandas import DataFrame
import re
df1=DataFrame({'folder_name':['f1','g1'],'name':['aa','bb']})
df2=DataFrame({'name':['aa','bb','aadoq','bbaddd','ding'],'icon':['i1','i2','i3','i4','i5']})
df1_name_list=df1['name']
df2_name_list=df2['name']
MappedName=[]
for name2 in df2_name_list:
for name1 in df1_name_list:
if re.match(name1,name2):
name2=name1
break
MappedName.append(name2)
df2['MappedName']=MappedName
df3=df1.merge(df2,left_on='name',right_on='MappedName',how='right').drop(['name_x','MappedName'],axis=1)
df4=df1.merge(df2,left_on='name',right_on='MappedName').drop(['name_x','MappedName'],axis=1)
print ('\ndf1\n',df1)
print ('\ndf2\n',df2)
print ('\ndf3\n',df3)
print ('\ndf4\n',df4)
The result looks like below
df1
folder_name name
0 f1 aa
1 g1 bb
df2
name icon MappedName
0 aa i1 aa
1 bb i2 bb
2 aadoq i3 aa
3 bbaddd i4 bb
4 ding i5 ding
df3
folder_name name_y icon
0 f1 aa i1
1 f1 aadoq i3
2 g1 bb i2
3 g1 bbaddd i4
4 NaN ding i5
df4
folder_name name_y icon
0 f1 aa i1
1 f1 aadoq i3
2 g1 bb i2
3 g1 bbaddd i4
If you want NaN when you merge and there is no match, use the df3 example otherwise, use df4. For large datasets (millions of data points), this might not be efficient code.
Upvotes: 1
Reputation: 5955
I would explicitly add a column for prefix, merge, then remove the column after: (Note, this only works if all the data is as you posted, with a 2-character prefix)
df1=pd.DataFrame({'folder_name':['f1','d1'],'name':['aa','bb']})
df2=pd.DataFrame({'name':['aa','bb','aaq','bbdfg'],'icon':[1,2,3,4]})
df2['prefix']=df2['name'].str[:2]
name icon prefix
0 aa 1 aa
1 bb 2 bb
2 aaq 3 aa
3 bbdfg 4 bb
df3=df2.merge(df1, left_on='prefix', right_on='name').drop(['name_y','prefix'], axis=1)
df3
name_x icon folder_name
0 aa 1 f1
1 aaq 3 f1
2 bb 2 d1
3 bbdfg 4 d1
Upvotes: 1