Prakhar Agarwal
Prakhar Agarwal

Reputation: 2852

I want to split Pandas Dataframe into 2 dataframes based on a condition

I have a Base dataframe with 4 columns.

 column_A column_B  column_C   id  
0       1       1      anna    123
1       2       1      anna      7
2      30       2      bob      42
2      20       2      bob      12
3      10       3      charlie   1
4     100       3      david     2

I want to split it into 2 different dataframes with the following properties.

Dataframe 1:

 column_A column_B  column_C   id  
0       1       1      anna    123
1       2       1      anna      7
2      30       2      bob      42
2      20       2      bob      12

where both values in column_B column_C match

Dataframe 2:

  column_A column_B  column_C   id
3      10       3      charlie   1
4     100       3      david     2

where only values in column_B match

Upvotes: 1

Views: 203

Answers (2)

Ankur Sinha
Ankur Sinha

Reputation: 6669

To keep the records without duplicates (row 5 and 6), use drop_duplicates function:

dfA = df.drop_duplicates(subset = ['column_B', 'column_C'], keep = False)

Output:

   column_A  column_B column_C  column_D
4        10         3  charlie         1
5       100         3    davis         2

To keep the records with duplicates (rows 1 to 4), use duplicated function:

dfB = df[df.duplicated(subset = ['column_B', 'column_C'], keep = False)]

Output:

   column_A  column_B column_C  column_D
0         1         1     anna       123
1         2         1     anna         7
2        30         2      bob        42
3        20         2      bob        12

Upvotes: 2

Zero
Zero

Reputation: 76947

You could check for duplicates.

In [200]: dfs = {i: n for i, n in df.groupby(
                    df.duplicated(subset=['column_B', 'column_C'], keep=False))}

In [201]: dfs[True]
Out[201]:
   column_A  column_B column_C   id
0         1         1     anna  123
1         2         1     anna    7
2        30         2      bob   42
2        20         2      bob   12

In [202]: dfs[False]
Out[202]:
   column_A  column_B column_C  id
3        10         3  charlie   1
4       100         3    david   2

Upvotes: 3

Related Questions