jonboy
jonboy

Reputation: 374

Filter df if more than two unique values - pandas

I have a df that contains values at various time points. I have two separate columns which should display a single set of unique values for each time point. This occurs for the most part but sometimes time points contain multiple unique values. I'm hoping to filter these using conditional logic.

For the df below, there are unique time points in Time. I only want to have one set of unique values for Value and Object for each time point. If two unique items occur in Value I want to drop X and keep Y, regardless of what is in Object. If there are two unique items in Object then I want to keep the first row. In this instance, the items will be the same in Value.

df = pd.DataFrame({   
        'Time' : ['2019-08-02 09:50:10.1','2019-08-02 09:50:10.1','2019-08-02 09:50:10.2','2019-08-02 09:50:10.2','2019-08-02 09:50:10.3','2019-08-02 09:50:10.3','2019-08-02 09:50:10.4','2019-08-02 09:50:10.4','2019-08-02 09:50:10.6','2019-08-02 09:50:10.6'],
        'Object' : ['A','A','B','B','C','A','C','B','B','B'],                 
        'Value' : ['X','X',np.nan,np.nan,'Y','X','Y','Y','Z','Z'],                  
        })

I initially considered returning the rows that contain more than one unique set of items at each time point, which could be used for filtering.

unq_Object = df.groupby('Time').apply(lambda x: x['Object'].unique())
unq_Value = df.groupby('Time').apply(lambda x: x['Value'].unique())

But I'm not sure if there is a more efficient method?

Intended output:

                    Time Object Value
0  2019-08-02 09:50:10.1      A     X
1  2019-08-02 09:50:10.1      A     X
2  2019-08-02 09:50:10.2      B   NaN
3  2019-08-02 09:50:10.2      B   NaN
4  2019-08-02 09:50:10.3      C     Y
5  2019-08-02 09:50:10.4      C     Y
6  2019-08-02 09:50:10.6      B     Z
7  2019-08-02 09:50:10.6      B     Z

Upvotes: 1

Views: 344

Answers (1)

BENY
BENY

Reputation: 323396

Update use duplicate

df[df.duplicated(keep=False)|df.index.isin(df.groupby('Time').head(1).index)]
Out[187]: 
                    Time Object Value
0  2019-08-02 09:50:10.1      A     X
1  2019-08-02 09:50:10.1      A     X
2  2019-08-02 09:50:10.2      B   NaN
3  2019-08-02 09:50:10.2      B   NaN
4  2019-08-02 09:50:10.3      C     Y
6  2019-08-02 09:50:10.4      C     Y
8  2019-08-02 09:50:10.6      B     Z
9  2019-08-02 09:50:10.6      B     Z

If all Time have more than one

we can do

df[df.Time.duplicated()|df.duplicated(keep=False)]

Upvotes: 1

Related Questions