Resolve duplicates after Pandas merge

Question

I have two Pandas Dataframes, named table_A and table_B:

table_A

ID	COST	DATE	MONTH
CLSLT88150	655.118	2019-12-30	1
CLSLT88150	652.758	2019-12-30	1

table_B

ID	NUMBER	ORDER	DEVICE
CLSLT88150	4.45882e+07	WO-SLT-GLI-REC-0009-5067	Apple iPhone XR
CLSLT88150	4.45882e+07	WO-SLT-GLI-REC-0009-5786	Apple iPhone XR

I want to merge these two tables on column ID to obtain a result similar to this table:

table_merge

ID	COST	DATE	MONTH	ORDER	NUMBER	DEVICE
CLSLT88150	655.118	2019-12-30	1	WO-SLT-GLI-REC-0009-5067	4.45882e+07	Apple iPhone XR
CLSLT88150	652.758	2019-12-30	1	WO-SLT-GLI-REC-0009-5786	4.45882e+07	Apple iPhone XR

Each ID can have multiple ORDER. Each ORDER can only appear once and has a unique COST associated. It is indifferent which COST is associated with which ORDER, so this table could also be an acceptable result:

ID	COST	DATE	MONTH	ORDER	NUMBER	DEVICE
CLSLT88150	652.758	2019-12-30	1	WO-SLT-GLI-REC-0009-5067	4.45882e+07	Apple iPhone XR
CLSLT88150	655.118	2019-12-30	1	WO-SLT-GLI-REC-0009-5786	4.45882e+07	Apple iPhone XR

I have used the following script to merge table_A with table_B:

table_merge = table_A.merge(table_B)

Obtaining this undesired result:

ID	COST	DATE	MONTH	ORDER	NUMBER	DEVICE
CLSLT88150	655.118	2019-12-30	1	WO-SLT-GLI-REC-0009-5067	4.45882e+07	Apple iPhone XR
CLSLT88150	655.118	2019-12-30	1	WO-SLT-GLI-REC-0009-5786	4.45882e+07	Apple iPhone XR
CLSLT88150	652.758	2019-12-30	1	WO-SLT-GLI-REC-0009-5067	4.45882e+07	Apple iPhone XR
CLSLT88150	652.758	2019-12-30	1	WO-SLT-GLI-REC-0009-5786	4.45882e+07	Apple iPhone XR

I tried and failed to remove the undesired rows with this script:

table_merge = table_merge.drop_duplicates(subset=['ORDER','ID'])

After dropping duplicates table_merge looks like this:

ID	COST	DATE	MONTH	ORDER	NUMBER	DEVICE
CLSLT88150	655.118	2019-12-30	1	WO-SLT-GLI-REC-0009-5067	4.45882e+07	Apple iPhone XR
CLSLT88150	655.118	2019-12-30	1	WO-SLT-GLI-REC-0009-5786	4.45882e+07	Apple iPhone XR

As it can be appreciated the COST 655.118 is repeated and the other COST of 652.758 is missing.

How can I obtain the desired result?

Thank you.

Resolve duplicates after Pandas merge

Answers (1)

Related Questions