thescoop
thescoop

Reputation: 514

Using Groups, drop duplicated NaNs only

With my data grouped by Assay, Image and Roi, for each group I wish to delete all but the first rows which have a NaN value in the 'Intensity' column.

My attempt is able to delete duplicates, but this isn't specific to NaN values.

from pandas import Series, DataFrame
import pandas as pd
import numpy as np

df = DataFrame({'assay':['cnt']*11,
                'image':['001']*10+['002'],
                'roi':['1']*5+['2']*5+['3'],
                'dist':[99,90,50,2,30,65,95,30,56,5,33],
                'cellArea':[99,90,50,2,30,65,95,30,56,5,33],
                'xy':np.fabs(np.random.randn(11)*100),
                'intensity':[88,88,1,3,67,67,67,95,1,3,2]},
               columns=['assay','image','roi','dist','xy','cellArea','intensity','adjacency'])
df.loc[df.intensity < 10, ['intensity','xy']] = np.nan
df

enter image description here

df.groupby(['assay','image','roi']).apply(lambda x: x.drop_duplicates(['intensity'], keep='first'))

enter image description here

Upvotes: 0

Views: 47

Answers (1)

akuiper
akuiper

Reputation: 214927

Instead of using drop_duplicates you can pass the duplicated & is.null checker into the lambda function as:

df.groupby(['assay','image','roi']).apply(lambda x: x.loc[~(x.duplicated(['intensity']) & x.intensity.isnull())])

Upvotes: 2

Related Questions