Reputation: 592
I have two pandas dataframes:
import pandas as pd
df1 = pd.DataFrame({'id': [101, 102, 103, 104, 105, 106],
'code': [1, 2, 3, 4, 5, 2],
'address_name':['ESPLANADA DOM CARLOS',
'CHAMP DE MARS 5 AV ANATOLE',
'PARQUE NACIONAL DA TIJUCA',
'C D ARISTIDES MAILLOL',
'CHAMP DE MARS 5 AV ANATOLE',
'CHAMP DE MARS 5 AV ANATOLE']})
print(df1)
id code address_name
101 1 ESPLANADA DOM CARLOS
102 2 CHAMP DE MARS 5 AV ANATOLE
103 3 PARQUE NACIONAL DA TIJUCA
104 4 C D ARISTIDES MAILLOL
105 5 CHAMP DE MARS 5 AV ANATOLE
106 2 CHAMP DE MARS 5 AV ANATOLE
df2 = pd.DataFrame({'cod': [5, 2, 1],
'population': [6748, 6748, 504],
'address':['PARQUE NACIONAL DA TIJUCA',
'CHAMP DE MARS 5 AV ANATOLE',
'ESPLANADA DOM CARLOS']})
print(df2)
cod population address
5 6748 PARQUE NACIONAL DA TIJUCA
2 6748 CHAMP DE MARS 5 AV ANATOLE
1 504 ESPLANADA DOM CARLOS
I would like to make a join between the dataframes where the 'address_name' of df_1 is equal to the 'address' of df2 and the 'code' of df2 was the same as the 'cod' of df2. If both were true then I would like to save in a new column what are the 'id's that this happened.
I tried implementing the following code using merge():
pd.merge(df2, df1[['address_name', 'code']], left_on="address", right_on="address_name", how='left')
The (wrong) output is being:
cod population address address_name code
5 6748 PARQUE NACIONAL DA TIJUCA PARQUE NACIONAL DA TIJUCA 3
2 6748 CHAMP DE MARS 5 AV ANATOLE CHAMP DE MARS 5 AV ANATOLE 2
2 6748 CHAMP DE MARS 5 AV ANATOLE CHAMP DE MARS 5 AV ANATOLE 5
2 6748 CHAMP DE MARS 5 AV ANATOLE CHAMP DE MARS 5 AV ANATOLE 2
1 504 ESPLANADA DOM CARLOS ESPLANADA DOM CARLOS 1
I would like the output to be:
cod population address address_name code id
2 6748 CHAMP DE MARS 5 AV ANATOLE CHAMP DE MARS 5 AV ANATOLE 2 [102,106]
1 504 ESPLANADA DOM CARLOS ESPLANADA DOM CARLOS 1 [101]
Upvotes: 0
Views: 52
Reputation:
merge
on two columns and groupby
all columns except "id" and create a list of "id"s:
merged_df = df1.merge(df2, left_on=['address_name','code'], right_on=['address','cod'])
out = merged_df.groupby(merged_df.drop('id', axis=1).columns.tolist())['id'].apply(list).reset_index()
Output:
code address_name cod population address
0 1 ESPLANADA DOM CARLOS 1 504 ESPLANADA DOM CARLOS [101]
1 2 CHAMP DE MARS 5 AV ANATOLE 2 6748 CHAMP DE MARS 5 AV ANATOLE [102, 106]
Upvotes: 1