Reputation: 23
I have the following pandas dataframe in Python (df1)
df1 = pd.DataFrame(np.array([["Jim", "1 4"], ["Jack", "1 2"], ["Joe", "2 3"]]),
columns=["name", "numeric position"])
I want to add a new column based on information from another pandas dataframe (df2)
df2 = pd.DataFrame(np.array([["Jim", 1, "Reverse"], ["Jim", 2, "Nearside"], ["Jim", 3, "Offside"], ["Jim", 4, "Front"],["Jack", 1, "Load1"], ["Jack", 2, "Load2"], ["Joe", 1, "Front"], ["Joe", 2, "Reverse"], ["Joe",3, "Driver"]]),
columns=["name", "number", "description"])
The desired outcome I want to be the following (df3)
df3 = pd.DataFrame(np.array([["Jim", "1 4", "Reverse,Front"], ["Jack", "1 2", "Load1,Load2"], ["Joe", "2 3", "Reverse,Driver"]]),
columns=["name", "numeric position", "location"])
So, I want to match the df1 and df2 dataframes based on the unique numeric values in the df1.'numeric position' and the rows from the df2.'number' for each 'name' and create a new column df1.'location' with commas between the different df2.'location'
I tried to split my enquiry in the above tasks but I cannot find a working solution as I have to extract information from the rows of the df2.
Could you please suggest a solution on my task? Thank you in advance!
Upvotes: 0
Views: 105
Reputation: 367
This should work too but it was a little longer.
# Split Numeric Position into separate columns
df1['numeric position_1'] = df1['numeric position'].str.split(' ').str.get(0)
df1['numeric position_2'] = df1['numeric position'].str.split(' ').str.get(1)
name numeric position numeric position_1 numeric position_2
0 Jim 1 4 1 4
1 Jack 1 2 1 2
2 Joe 2 3 2 3
# Merge df1 and df2 to get description 1
temp_df = pd.merge(df1,df2,left_on=['name','numeric position_1'],right_on=['name', 'number'])
name numeric position numeric position_1 numeric position_2 number description
0 Jim 1 4 1 4 1 Reverse
1 Jack 1 2 1 2 1 Load1
2 Joe 2 3 2 3 2 Reverse
# Merge temp_df and df2 to get description 2
temp_df = pd.merge(temp_df,df2,left_on=['name','numeric position_2'],right_on=['name', 'number'])
# Create location column
temp_df['location'] = temp_df['description_x'] + ',' + temp_df['description_y']
# Store final output in df3
df3 = temp_df[['name', 'numeric position', 'location']].copy()
Upvotes: 1
Reputation: 146
Quick and dirty solution
df1['location'] = df1.apply(lambda x: ','.join([df2[(df2['name'] == x['name']) & (df2['number'].map(int) == int(y))]['description'].iloc[0] for y in x['numeric position'].split(' ')]), axis=1)
df1
# name numeric position location
# 0 Jim 1 4 Reverse,Front
# 1 Jack 1 2 Load1,Load2
# 2 Joe 2 3 Reverse,Driver
or
df1['location'] = df1.apply(lambda x: ','.join(df2[(df2['name'] == x['name']) & (df2['number'].isin(x['numeric position'].split(' ')))]['description']), axis=1)
Upvotes: 1