Mitchell Anderson
Mitchell Anderson

Reputation: 13

Python Pandas Lookup another Dataframe return Multiple Matches

I have a dataframe (customers) of customers, with a unique id.

I need to look at another dataframe (meetings) of meetings that have been held (lookup using the customer id) and return the date of the most recent meeting.

Most customers will have had multiple meetings, but some customers will have had none. In this case, I need to return 0.

Customers
id      name
1607    duck
1622    dog
1972    cat
2204    bird
2367    fish
2373    elephant
2386    moose
2413    mammal
2418    man
22120   goldfish
6067    toucan
83340   capybara

meetings as below:

meetings
customer_id date       meeting_id
1607        25/02/2019  1235
1607        11/03/2019  2315
1607        11/03/2019  5483
1622        16/11/2018  32125
1972        13/02/2019  6548
2204        4/02/2019   6542
2204        8/11/2018   8755    
2367        22/01/2019  6545
2373        14/12/2018  8766    
2373        18/01/2019  5448
2386        18/02/2019  32125
2386        18/02/2019  5458
2413        6/12/2018   31125
2413        5/03/2019   5183
2418        21/01/2019  3158
2418        23/01/2019  3127
2418        24/01/2019  7878
2418        21/01/2019  7894
2418        31/01/2019  7895
2418        6/03/2019   4548

I want to return the customers table, with another column showing the most recent meeting and its meeting_id, as below:

id      name        most_recent  most_recent_id
1607    duck        11/03/2019   xxxx
1622    dog         16/11/2018   xxxxx
1972    cat         13/02/2019   xxxx
2204    bird        4/02/2019    etc
2367    fish        22/01/2019
2373    elephant    18/01/2019
2386    moose       18/02/2019
2413    mammal      5/03/2019
2418    man         6/03/2019
22120   goldfish    0
6067    toucan      0
83340   capybara    0

have tried a couple of different ways, by looping through the df's etc, but haven't got anything that works any help appreciated! thanks.

Upvotes: 1

Views: 503

Answers (1)

Mohamed Thasin ah
Mohamed Thasin ah

Reputation: 11192

try this,

df2=df2.drop_duplicates(subset=['customer_id'],keep='last')
pd.merge(df1, df2, left_on=['id'], right_on=['customer_id'], how='left').rename(columns={'date':'most_recent','meeting_id':'most_recent_id'}).drop('customer_id',1).fillna(0)

you just need to remove all the duplicate records from df2 and keep last(latest) record. then apply left merge.

O/P:

       id      name most_recent  most_recent_id
0    1607      duck  11/03/2019          5483.0
1    1622       dog  16/11/2018         32125.0
2    1972       cat  13/02/2019          6548.0
3    2204      bird   8/11/2018          8755.0
4    2367      fish  22/01/2019          6545.0
5    2373  elephant  18/01/2019          5448.0
6    2386     moose  18/02/2019          5458.0
7    2413    mammal   5/03/2019          5183.0
8    2418       man   6/03/2019          4548.0
9   22120  goldfish           0             0.0
10   6067    toucan           0             0.0
11  83340  capybara           0             0.0

Upvotes: 1

Related Questions