Reputation: 2860
I have a pandas dataframe like below
I want to split the dataframe and create two separate dataframes based on whether I have a unique group of 'O', 'A', 'N', 'value_next'
or not. So I did this:
mask = dft.groupby(['O', 'A', 'N', 'value_next']).filter(lambda x: len(x) <= 1)
df1 = dft[mask]
df2 = dft[~mask]
But the line df1 = dft[mask]
gives error
ValueError: Boolean array expected for the condition, not int64
What am I missing?
Upvotes: 0
Views: 488
Reputation: 4548
Here is a slightly different approach using .duplicated
instead of groupby/filter which can be really slow if you have a large dft. Note keep=False
which marks all duplicate rows, instead of ignoring the first instance of a duplicate which is default behavior
import pandas as pd
import numpy as np
num_rows = 100
np.random.seed(1)
#Creating a test df
dft = pd.DataFrame({
'time':np.random.randint(5,25,num_rows),
'O':np.random.randint(1,4,num_rows),
'A':np.random.randint(1,4,num_rows),
'N':np.random.randint(1,4,num_rows),
'value':np.random.randint(10,100,num_rows),
'value_next':np.random.randint(-10,40,num_rows),
})
#Getting a mask of True if duplicated, False otherwise
is_dup = dft.duplicated(['O', 'A', 'N', 'value_next'],keep=False)
df1 = dft[~is_dup]
df2 = dft[is_dup]
print(df2)
#Quick check that a row in df2 was originally duplicated
dft[
dft['O'].eq(2) &
dft['A'].eq(3) &
dft['N'].eq(1) &
dft['value_next'].eq(8)
]
Upvotes: 1