Zephyr
Zephyr

Reputation: 1352

Creating a new column in a data frame by mapping multiple columns in pandas

I want to map one column of dataframe to another dataframe by using multiple columns. The sample dataframes are as follow:

df1 = pd.DataFrame()
df1['Date'] = ['2018-08-10','2018-08-10','2018-08-10','2018-08-10','2018-08-10',
               '2018-08-11','2018-08-11','2018-08-11','2018-08-12','2018-08-12',
               '2018-08-13','2018-08-12','2018-08-12','2018-08-12','2018-08-12']
df1['Name'] = ['A','A','A','A','A',
              'B','B','B','C','C',
              'D','D','E','F','F']
df1 ['Orgz'] = ['Alpha','Alpha','Alpha','Alpha','Alpha',
               'Beta','Beta','Beta','Gamma','Gamma',
               'Theta','Theta','Theta','Theta','Theta']
df2 = pd.DataFrame()
df2['Date'] = ['2018-08-10','2018-08-11','2018-08-12','2018-08-12','2018-08-13','2018-08-12','2018-08-12']
df2['Name'] = ['A','B','C','D','D','E','F']
df2 ['Orgz']= ['Alpha','Beta','Gamma','Theta','Theta','Theta','Theta']
df2 ['min'] = [60,85,90,77,25,20,30]

I want to map the column df2.min to df1 by creating a new column. However, I have three common columns which are Date, Name and Orgz to Map. In this case, I cannot just use Name to map it as there will be the same name at different date in my actual database. So I decided to use three columns (Date,Orgz and Name) to map with df2. The result will be assinged to a new column in df1. My actual result will look like as follow:

enter image description here

From the results, you will notice that some of the name are multiple at the same date. in that case, I just want to map the min value on the first row, the rest of the row are set to 0.

How can I do it?

Thanks, Zep

Upvotes: 1

Views: 978

Answers (1)

jezrael
jezrael

Reputation: 862481

I believe you need merge with left join and then set 0 by numpy.where with boolean mask created by DataFrame.duplicated:

df3 = df1.merge(df2, on=['Date','Name','Orgz'], how='left')
df3['min'] = np.where(df3.duplicated(subset=['Date','Name','Orgz']), 0, df3['min'])
print (df3)
          Date Name   Orgz  min
0   2018-08-10    A  Alpha   60
1   2018-08-10    A  Alpha    0
2   2018-08-10    A  Alpha    0
3   2018-08-10    A  Alpha    0
4   2018-08-10    A  Alpha    0
5   2018-08-11    B   Beta   85
6   2018-08-11    B   Beta    0
7   2018-08-11    B   Beta    0
8   2018-08-12    C  Gamma   90
9   2018-08-12    C  Gamma    0
10  2018-08-13    D  Theta   77
11  2018-08-13    D  Theta    0
12  2018-08-12    E  Theta   25
13  2018-08-12    F  Theta   30
14  2018-08-12    F  Theta    0

Upvotes: 1

Related Questions