Match DataFrames in Python and create a new column

I have the following pandas dataframe in Python (df1)

df1 = pd.DataFrame(np.array([["Jim", "1 4"], ["Jack", "1 2"], ["Joe", "2 3"]]),
                   columns=["name", "numeric position"])

I want to add a new column based on information from another pandas dataframe (df2)

df2 = pd.DataFrame(np.array([["Jim", 1, "Reverse"], ["Jim", 2, "Nearside"], ["Jim", 3, "Offside"], ["Jim", 4, "Front"],["Jack", 1, "Load1"], ["Jack", 2, "Load2"], ["Joe", 1, "Front"], ["Joe", 2, "Reverse"], ["Joe",3, "Driver"]]),
                   columns=["name", "number", "description"])

The desired outcome I want to be the following (df3)

df3 = pd.DataFrame(np.array([["Jim", "1 4", "Reverse,Front"], ["Jack", "1 2", "Load1,Load2"], ["Joe", "2 3", "Reverse,Driver"]]),
                   columns=["name", "numeric position", "location"])

So, I want to match the df1 and df2 dataframes based on the unique numeric values in the df1.'numeric position' and the rows from the df2.'number' for each 'name' and create a new column df1.'location' with commas between the different df2.'location'

I tried to split my enquiry in the above tasks but I cannot find a working solution as I have to extract information from the rows of the df2.

Could you please suggest a solution on my task? Thank you in advance!

Upvotes: 0

Views: 105

Answers (2)

Mike
Mike

Reputation: 367

This should work too but it was a little longer.

# Split Numeric Position into separate columns
df1['numeric position_1'] = df1['numeric position'].str.split(' ').str.get(0)
df1['numeric position_2'] = df1['numeric position'].str.split(' ').str.get(1)

    name    numeric position    numeric position_1  numeric position_2
0   Jim                  1 4                     1                 4
1   Jack                 1 2                     1                 2
2   Joe                  2 3                     2                 3

# Merge df1 and df2 to get description 1
temp_df = pd.merge(df1,df2,left_on=['name','numeric position_1'],right_on=['name', 'number'])

    name    numeric position    numeric position_1  numeric position_2  number  description
    0   Jim                  1 4                 1                   4    1  Reverse
    1   Jack                 1 2                 1                   2    1  Load1
    2   Joe                  2 3                 2                   3    2  Reverse

# Merge temp_df and df2 to get description 2
temp_df = pd.merge(temp_df,df2,left_on=['name','numeric position_2'],right_on=['name', 'number'])

# Create location column
temp_df['location'] = temp_df['description_x'] + ',' + temp_df['description_y']

# Store final output in df3
df3 = temp_df[['name', 'numeric position', 'location']].copy()

Upvotes: 1

Wazaa
Wazaa

Reputation: 146

Quick and dirty solution

df1['location'] = df1.apply(lambda x: ','.join([df2[(df2['name'] == x['name']) & (df2['number'].map(int) == int(y))]['description'].iloc[0] for y in x['numeric position'].split(' ')]), axis=1)

df1

#    name numeric position         location
# 0   Jim              1 4   Reverse,Front
# 1  Jack              1 2     Load1,Load2
# 2   Joe              2 3  Reverse,Driver

or

df1['location'] = df1.apply(lambda x: ','.join(df2[(df2['name'] == x['name']) & (df2['number'].isin(x['numeric position'].split(' ')))]['description']), axis=1)

Upvotes: 1

Related Questions