Reputation: 438
I believe that my problem is really straightforward and there must be an really easy way to solve this issue, however as I am quite new with Python, I could not sort it out by my own an elegant solution. This question is very likely duplicated, however I could not find it, so please, if you have seen another question to my problem, take this as another way to elaborate on it.
So, let's say I have the following dataframe:
import pandas as pd
import matplotlib.pyplot as plt
data = {'Column A': [300,300,450,500,500,750,600,300,800],
'Column B': [1,1,2,2,3,3,0,2,3],
'Column C': ["Value_1", "Value_2", "Value_3", "Value_4", "Value_1",
"Value_2", "Value_3", "Value_4", "Value_5"]}
df = pd.DataFrame(data, columns=['Column A','Column B', 'Column C'])
From there, I take two other dataframes:
df1 = df.loc[(df['Column C'] == "Value_4")]
df2 = df.loc[(df['Column C'] == "Value_3")]
Now I want to create a third dataframe (df3) which is composed by the rows present in df (existing dataframe) that ARE NOT in df1 and df2. I don't want to extract or change anything on the existing dataframe (df).
Hope that I managed to be succinct and precise. I would really appreciate your help on this one!
Upvotes: 1
Views: 1139
Reputation: 131
Simple way to do this is:
you have to get unique
values of Column_C
for df1
and df2
and create list
.
l=[df1['Column_C'].unique()[0],df2['Column_C'].unique()[0]]
then use NOT IN with df.
df3 = df[~df.Column_C.isin(l)]
Upvotes: 4
Reputation: 30920
You can do:
df3 = df.loc[~(df['Column C'] == "Value_4") & ~(df['Column C'] == "Value_3")]
Upvotes: 1
Reputation: 862791
Use Index.intersection
with
Index.union
and select rows by DataFrame.loc
:
df3 = df.loc[df.index.difference(df1.index.union(df2.index))]
print (df3)
Column A Column B Column C
0 300 1 Value_1
1 300 1 Value_2
4 500 3 Value_1
5 750 3 Value_2
8 800 3 Value_5
Another idea is use Series.isin
with both value used for filtering and invert condition by ~
:
df3 = df[~df['Column C'].isin(['Value_4','Value_3'])]
Or if need test all columns by values for membership is possible use indicator=True
in merge
with concat
ed both DataFrames:
s = df.merge(pd.concat([df1, df2]).drop_duplicates(), how='left', indicator=True)['_merge']
df3 = df[s == 'left_only']
print (df3)
Column A Column B Column C
0 300 1 Value_1
1 300 1 Value_2
4 500 3 Value_1
5 750 3 Value_2
8 800 3 Value_5
Upvotes: 1
Reputation: 6649
If I understand, may be you can do this by using isin and negating it using ~
(there are many ways to get to your solution, I post one of them):
df3 = df.loc[~(df['Column C'].isin(df1['Column C'])) & ~(df['Column C'].isin(df2['Column C']))]
What we do is, we check for rows from df which are not in df1 for Column C, the same for df and df2 with Column C. You are left with other rows then as shown below.
Output:
Column A Column B Column C
0 300 1 Value_1
1 300 1 Value_2
4 500 3 Value_1
5 750 3 Value_2
8 800 3 Value_5
Upvotes: 1
Reputation: 19947
You can remove index of df2+df3 from df.index and use the diff to reindex df:
df.reindex(df.index.difference(df1.index.union(df2.index)))
Column A Column B Column C
0 300 1 Value_1
1 300 1 Value_2
4 500 3 Value_1
5 750 3 Value_2
8 800 3 Value_5
Upvotes: 1