Reputation: 374
I have a df that contains values at various time points. I have two separate columns which should display a single set of unique values for each time point. This occurs for the most part but sometimes time points contain multiple unique values. I'm hoping to filter these using conditional logic.
For the df below, there are unique time points in Time
. I only want to have one set of unique values for Value
and Object
for each time point. If two unique items occur in Value
I want to drop X
and keep Y
, regardless of what is in Object
. If there are two unique items in Object
then I want to keep the first row. In this instance, the items will be the same in Value
.
df = pd.DataFrame({
'Time' : ['2019-08-02 09:50:10.1','2019-08-02 09:50:10.1','2019-08-02 09:50:10.2','2019-08-02 09:50:10.2','2019-08-02 09:50:10.3','2019-08-02 09:50:10.3','2019-08-02 09:50:10.4','2019-08-02 09:50:10.4','2019-08-02 09:50:10.6','2019-08-02 09:50:10.6'],
'Object' : ['A','A','B','B','C','A','C','B','B','B'],
'Value' : ['X','X',np.nan,np.nan,'Y','X','Y','Y','Z','Z'],
})
I initially considered returning the rows that contain more than one unique set of items at each time point, which could be used for filtering.
unq_Object = df.groupby('Time').apply(lambda x: x['Object'].unique())
unq_Value = df.groupby('Time').apply(lambda x: x['Value'].unique())
But I'm not sure if there is a more efficient method?
Intended output:
Time Object Value
0 2019-08-02 09:50:10.1 A X
1 2019-08-02 09:50:10.1 A X
2 2019-08-02 09:50:10.2 B NaN
3 2019-08-02 09:50:10.2 B NaN
4 2019-08-02 09:50:10.3 C Y
5 2019-08-02 09:50:10.4 C Y
6 2019-08-02 09:50:10.6 B Z
7 2019-08-02 09:50:10.6 B Z
Upvotes: 1
Views: 344
Reputation: 323396
Update use duplicate
df[df.duplicated(keep=False)|df.index.isin(df.groupby('Time').head(1).index)]
Out[187]:
Time Object Value
0 2019-08-02 09:50:10.1 A X
1 2019-08-02 09:50:10.1 A X
2 2019-08-02 09:50:10.2 B NaN
3 2019-08-02 09:50:10.2 B NaN
4 2019-08-02 09:50:10.3 C Y
6 2019-08-02 09:50:10.4 C Y
8 2019-08-02 09:50:10.6 B Z
9 2019-08-02 09:50:10.6 B Z
If all Time have more than one
we can do
df[df.Time.duplicated()|df.duplicated(keep=False)]
Upvotes: 1