Lucas Dadalt
Lucas Dadalt

Reputation: 91

Comparing dataframes with different sizes in pandas and create new column based on comparison

I am importing via pyodbc 2 dataframes: df1 and df2.

They are big and I need to compare them.

PS: They don't have the same size.

3 columns size Dataframes:

Sizes and format of dataframes

What I want to do:

If CODUSU columns are equal then df2[Situação] = 'K'

What I have done, but it's too slow:

for i in range(0,len(df2)):
  for k in range(0,len(df1)):
    if df2.loc[i][0] == df1.loc[k][0]:
        df2[i]["Situação"] = "K"

I also need to:

If in item in df1[CODUSU] and not in df2[CODUSU] then df2[Situação] = 'Q'

If in item in df2[CODUSU] and not in df1[CODUSU] then df2[Situação] = 'B'

Upvotes: 0

Views: 155

Answers (1)

Serge Ballesta
Serge Ballesta

Reputation: 149175

You should do an outer merge on the dataframes asking for the indicator variable:

resul = df2.merge(df1, how='outer', on='A', suffixes=('', '_y'),indicator=True)

Do not forget the DTDOSE column for rows coming from df1 only:

resul.loc[resul['indicator'] == 'right_only', 'DTDOSE'] = resul.loc[
                                  resul['indicator'] == 'right_only', 'DTDOSE_y']

Time to compute the new values for the Situação column:

resul.loc[resul['indicator'] == 'both', 'Situação'] = 'K'
resul.loc[resul['indicator'] == 'right_only', 'Situação'] = 'Q'
resul.loc[resul['indicator'] == 'left_only', 'Situação'] = 'B'

And finally get rid of the auxiliary columns:

resul = resul[['CODUSU', 'DTDOSE', 'Situação']

Upvotes: 1

Related Questions