Marc Schwambach
Marc Schwambach

Reputation: 438

Creating a new dataframe from existing ones based on logical not

I believe that my problem is really straightforward and there must be an really easy way to solve this issue, however as I am quite new with Python, I could not sort it out by my own an elegant solution. This question is very likely duplicated, however I could not find it, so please, if you have seen another question to my problem, take this as another way to elaborate on it.

So, let's say I have the following dataframe:

import pandas as pd
import matplotlib.pyplot as plt

data = {'Column A': [300,300,450,500,500,750,600,300,800],
'Column B': [1,1,2,2,3,3,0,2,3], 
'Column C': ["Value_1", "Value_2", "Value_3", "Value_4", "Value_1", 
"Value_2", "Value_3", "Value_4", "Value_5"]}
df = pd.DataFrame(data, columns=['Column A','Column B', 'Column C'])

From there, I take two other dataframes:

df1 = df.loc[(df['Column C'] == "Value_4")]

df2 = df.loc[(df['Column C'] == "Value_3")]

Now I want to create a third dataframe (df3) which is composed by the rows present in df (existing dataframe) that ARE NOT in df1 and df2. I don't want to extract or change anything on the existing dataframe (df).

Hope that I managed to be succinct and precise. I would really appreciate your help on this one!

Upvotes: 1

Views: 1139

Answers (5)

Prajit Karande
Prajit Karande

Reputation: 131

Simple way to do this is:

  1. first you have to set your column name with no space.

  1. you have to get unique values of Column_C for df1 and df2 and create list.

    l=[df1['Column_C'].unique()[0],df2['Column_C'].unique()[0]]
    

  1. then use NOT IN with df.

    df3 = df[~df.Column_C.isin(l)]
    

Upvotes: 4

ansev
ansev

Reputation: 30920

You can do:

df3 = df.loc[~(df['Column C'] == "Value_4") & ~(df['Column C'] == "Value_3")]

Upvotes: 1

jezrael
jezrael

Reputation: 862791

Use Index.intersection with Index.union and select rows by DataFrame.loc:

df3 = df.loc[df.index.difference(df1.index.union(df2.index))]
print (df3)
   Column A  Column B Column C
0       300         1  Value_1
1       300         1  Value_2
4       500         3  Value_1
5       750         3  Value_2
8       800         3  Value_5

Another idea is use Series.isin with both value used for filtering and invert condition by ~:

df3 = df[~df['Column C'].isin(['Value_4','Value_3'])]

Or if need test all columns by values for membership is possible use indicator=True in merge with concated both DataFrames:

s = df.merge(pd.concat([df1, df2]).drop_duplicates(), how='left', indicator=True)['_merge']

df3 = df[s == 'left_only']
print (df3)
   Column A  Column B Column C
0       300         1  Value_1
1       300         1  Value_2
4       500         3  Value_1
5       750         3  Value_2
8       800         3  Value_5

Upvotes: 1

Ankur Sinha
Ankur Sinha

Reputation: 6649

If I understand, may be you can do this by using isin and negating it using ~ (there are many ways to get to your solution, I post one of them):

df3 = df.loc[~(df['Column C'].isin(df1['Column C'])) & ~(df['Column C'].isin(df2['Column C']))]

What we do is, we check for rows from df which are not in df1 for Column C, the same for df and df2 with Column C. You are left with other rows then as shown below.

Output:

   Column A  Column B Column C
0       300         1  Value_1
1       300         1  Value_2
4       500         3  Value_1
5       750         3  Value_2
8       800         3  Value_5

Upvotes: 1

Allen Qin
Allen Qin

Reputation: 19947

You can remove index of df2+df3 from df.index and use the diff to reindex df:

df.reindex(df.index.difference(df1.index.union(df2.index)))

    Column A    Column B    Column C
0   300         1           Value_1
1   300         1           Value_2
4   500         3           Value_1
5   750         3           Value_2
8   800         3           Value_5

Upvotes: 1

Related Questions