Comparing Dataframe with different amount of rows and columns

Question

I have a data frame A like this:

CARD_NUMBER	TRANSACTION_AMOUNT	TERMINAL_ID	EXTERNAL_STAN	TRANSACTION_DATE
604861*********3947	40000	Y6152114	686817	21/01/2021 09:53:41
440833******4229	200000	Y6152114	686821	21/01/2021 10:04:05
425322******8727	200000	Y6152114	686847	21/01/2021 12:06:25
406606******7268	300000	Y6152114	207606	21/01/2021 15:16:33
425322******5923	300000	Y6152114	683213	21/01/2021 16:25:04

And another data frame B which looks like this:

MICROFILM_REF_NUMBER	LINKUP_CODE	REVERSAL_FLAG	MCC	TRANSACTION_AMOUNT	TRANSACTION_CURRENCY	TRANSACTION_CURRENCY_EXP	TRANSACTION_DATE	ISSUER_ID	ACQUIRER_ICA	CLEARING_BIN	ACQUIRER_BANK_CODE	MERCHANT_COUNTRY_CODE	MERCHANT_ACRONYM	OUTLET_NUMBER	TERMINAL_ID	MERCHANT_TRANSACTION_SIGN	ISSUER_BANK_CODE	CLIENT_CODE	CARD_NUMBER	EXPIRY_DATE	TRANSACTION_SIGN	BILLING_AMOUNT	BILLING_CURRENCY_EXP	NETWORK_CODE	SETTLEMENT_AMOUNT	SETTLEMENT_CURRENCY	SETTLEMENT_CURRENCY_EXP	FUNCTION_CODE	AUTHORIZATION_CODE	AUTHORIZATION_DATE	REFERENCE_NUMBER	INTERNAL_STAN	EXTERNAL_STAN
70000021024100188324015	AUTH10401003	N	6011	509,47	978	2	26/01/2021 18:39	515880	12658	12658	000003	710	CALENDAR GIRLS BOYS JOHANNESBURG ZA	000045	000045	C	ISS003	0360835902	5158800746440660	01/01/2021	D	509,47	2	02	509,47	978	2	205	407309	26/01/2021 17:55	407309407309	670676	407309
70000021024100188324320	AUTH10401003	N	6011	3038,85	978	2	26/01/2021 17:59	515880	12658	12658	000003	710	CALENDAR GIRLS BOYS JOHANNESBURG ZA	000045	000045	C	ISS003	0348497473	5158800034404519	01/01/2021	D	3038,85	2	02	3038,85	978	2	205	763004	26/01/2021 17:16	763004763004	456406	763004
70000021024100188324064	AUTH10401003	N	6011	6762,29	978	2	26/01/2021 18:12	515880	12658	12658	000003	710	CALENDAR GIRLS BOYS JOHANNESBURG ZA	000045	000045	C	ISS003	0390027716	5158800663287854	01/01/2021	D	6762,29	2	02	6762,29	978	2	205	001532	26/01/2021 17:29	001532001532	536033	001532
70000021024100188324130	AUTH10401003	N	6011	431,87	978	2	26/01/2021 17:35	515880	12658	12658	000003	710	CALENDAR GIRLS BOYS JOHANNESBURG ZA	000045	000045	C	ISS003	0360236142	5158800688202193	01/01/2021	D	431,87	2	02	431,87	978	2	205	258257	26/01/2021 16:52	258257258257	286480	258257

I want to add a column 'Match' to data frame A so that one of the rows that exist from data frame A exits in data frame B then 'Match' is True, otherwise, it is False.

I also want to generate a file with the number of rows that matched and the number of rows that didn't.

Azzedine · Accepted Answer

One possible solution to your problem is to use the merge function offer by Pandas. If you're familiar with SQL, this is just like a table join. The pd.merge adds the columns from df1 to df2 by checking where the two data frames match on the names of the columns. This can be accomplished using the following function :

df3=df1.merge(df2, on=['col1','col2','col3'], how='left', indicator='Match')

For example, if df1 was :

FIRST_NAME	LAST_NAME	SALARY
ahmed	mobarki	20000.0
mourad	melodi	96000.0
adam	karimi	25000.0
fatima	barkaoui	30000.0
david	mahri	40500.0
said	melloki	96400.0
ayoube	sharfi	36420.0
john	salali	97500.0
hajar	mously	52400.0

and df2 was defined as:

Perform Left JOIN with an indicator.

df3 = df1.merge(df2, on=['FIRST_NAME','LAST_NAME','salary'], how='left', indicator='Match')

setting indicator=Match adds a column to the merged DataFame named as Match where the value of each row can be one of three possible values: left_only, right_only, or both.

we are interested when Match is equal both , because it means that the observations merge key is found in both DataFrames (we will then convert the both to True, or else it will be False).

df3['Match'] = df3['Match'] == 'both'

The df3 will look like this :

Now let's back to your dataframes A and B:

C=A.merge(B, on=['CARD_NUMBER','TRANSACTION_AMOUNT','TERMINAL_ID', 'EXTERNAL_STAN', 'TRANSACTION_DATE'], how='left', indicator='Match')

C['Match'] = C['Match'] == 'both'
# create a new column Match in dataframe A  and fill it from C
A["Match"] = C['Match']

Comparing Dataframe with different amount of rows and columns

Answers (1)

Related Questions