Reputation: 2852
I have a Base dataframe with 4 columns.
column_A column_B column_C id
0 1 1 anna 123
1 2 1 anna 7
2 30 2 bob 42
2 20 2 bob 12
3 10 3 charlie 1
4 100 3 david 2
I want to split it into 2 different dataframes with the following properties.
Dataframe 1:
column_A column_B column_C id
0 1 1 anna 123
1 2 1 anna 7
2 30 2 bob 42
2 20 2 bob 12
where both values in column_B column_C match
Dataframe 2:
column_A column_B column_C id
3 10 3 charlie 1
4 100 3 david 2
where only values in column_B match
Upvotes: 1
Views: 203
Reputation: 6669
To keep the records without duplicates (row 5 and 6), use drop_duplicates
function:
dfA = df.drop_duplicates(subset = ['column_B', 'column_C'], keep = False)
Output:
column_A column_B column_C column_D
4 10 3 charlie 1
5 100 3 davis 2
To keep the records with duplicates (rows 1 to 4), use duplicated
function:
dfB = df[df.duplicated(subset = ['column_B', 'column_C'], keep = False)]
Output:
column_A column_B column_C column_D
0 1 1 anna 123
1 2 1 anna 7
2 30 2 bob 42
3 20 2 bob 12
Upvotes: 2
Reputation: 76947
You could check for duplicates.
In [200]: dfs = {i: n for i, n in df.groupby(
df.duplicated(subset=['column_B', 'column_C'], keep=False))}
In [201]: dfs[True]
Out[201]:
column_A column_B column_C id
0 1 1 anna 123
1 2 1 anna 7
2 30 2 bob 42
2 20 2 bob 12
In [202]: dfs[False]
Out[202]:
column_A column_B column_C id
3 10 3 charlie 1
4 100 3 david 2
Upvotes: 3