Little
Little

Reputation: 3477

Remove rows from a couple of dataframes with equal values

I have two data frames that I obtained from a couple of csv files. For example:

df1
    data1   data2   data3
0   cow     cat     53
1   girl    boy     12  
2   monkey  island  30
3   lucas   arts    14

df2
    data1   data2   data3
0   girl    boy     50
1   cover   disc    45  
2   girl    boy     47
3   pen     pencil  15
4   book    note    30
5   lucas   arts    15


df2
    data1   data2   data3
0   cover   disc    45
1   pen     pencil  15
2   book    note    30

I would like to delete from df2 those rows that have the same values of data1 and data2 from df1, so that at the end I will end up with an updated df2. So far, I have done the following:

    file1="fileA.csv"
    file2="fileB.csv"
    df1=pd.read_csv(file1)
    df2=pd.read_csv(file2)
    cond = df1[['data1','data2']].isin(df2[['data1','data2']])
    df2.drop(df1[cond].index,inplace=True)
    df2.to_csv("fileBUpdtated.csv",index=False)

But I am not getting the results that I need, what am I missing?

Upvotes: 3

Views: 736

Answers (4)

BENY
BENY

Reputation: 323276

Let us try

out = df2[df2[['data1', 'data2']].apply(tuple,1).isin(df1[['data1', 'data2']].apply(tuple,1))]

Upvotes: 0

Joe Ferndz
Joe Ferndz

Reputation: 8508

The best option I found to do this is the following:

print (df2[~df2.isin(df1[['data1','data2']])])

Here, you are searching the values in each row of df2 (superset dataframe) against the values in df1 (2 specific columns in subset dataframe). If they are not found, then you want to print those records from df2.

This one does not do a merge. Instead, it is using boolean and index.

The output of this will be:

   data1   data2  data3
1  cover    disc     45
3    pen  pencil     15
4   book    note     30

Alternate, you can do the below:

You can find the common items in both, then exclude them and print the values.

If you want to remove them from df2, then reassign it back to df2.

common = df2.merge(df1,on=['data1','data2'])
df2 = df2[(~df2.data1.isin(common.data1))&(~df2.data2.isin(common.data2))]
print (df2)

The output of this will be:

   data1   data2  data3
1  cover    disc     45
3    pen  pencil     15
4   book    note     30

Upvotes: 1

Michael Szczesny
Michael Szczesny

Reputation: 5036

You can append df1 to df2, then drop_duplicates and keep the rows that are left and came from df2

df2.append(df1, ignore_index=True).drop_duplicates(['data1','data2'], keep=False).loc[:df2.index.max()]

Out:

   data1   data2  data3
1  cover    disc     45
3    pen  pencil     15
4   book    note     30

Upvotes: 2

ansev
ansev

Reputation: 30920

You need DataFrame.merge

cols = ['data1', 'data2']
df2 = df2.merge(df1[cols], on=cols, how='left', indicator=True)\
    .loc[lambda x: x._merge.ne('both')].drop('_merge', axis=1)
print(df2)


   data1   data2  data3
1  cover    disc     45
3    pen  pencil     15
4   book    note     30

Upvotes: 6

Related Questions