pandas dataframe filtering row like groupby

Question

for example, I have a dataframe with these 2 columns a, and b:

a = [1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3]
b = [1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1]

I am expecting filtered dataframe: [5,6,7,2,3,4,9,0,1]

Without using the groupby function (because it is take too long time with a very large dataframe, it just not usable), how do I filter with the last 3 items from each group in col. a?

Divakar · Accepted Answer

Approach #1 : Here's a NumPy based approach -

In [89]: a = np.array([1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3])
    ...: b = np.array([1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1])
    ...: 

In [90]: idx = np.append(np.nonzero(a[1:] > a[:-1])[0], a.size-1)[:,None] - [2,1,0]

In [91]: b[idx].ravel()
Out[91]: array([5, 6, 7, 2, 3, 4, 9, 0, 1])

If you are receiving those from the columns of a dataframe, df named a and b respectively, as the pre-processing step, we need to extract those as arrays, like so -

a = df.a.values
b = df.b.values

Please note that this assumes at least three elements per group. For cases with lesser than 3 elems per group read on to the next approach.

Approach #2 : With Scipy's binary dilation to create a mask for selecting elements off b -

from scipy.ndimage.morphology import binary_dilation as imdilate
def filter_lastN(a, b, N):
    mask = np.zeros(a.size,dtype=bool)
    mask[np.append(np.nonzero(a[1:] > a[:-1])[0],b.size-1)] = 1
    return b[imdilate(mask,np.ones(N),origin=(N-1)//2)]

Sample run -

In [198]: a
Out[198]: array([1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3])

In [199]: b
Out[199]: array([5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1])

In [200]: filter_lastN(a,b,3)
Out[200]: array([5, 6, 7, 2, 3, 4, 9, 0, 1])

In [201]: filter_lastN(a,b,5)
Out[201]: array([5, 6, 7, 0, 1, 2, 3, 4, 7, 8, 9, 0, 1])

pandas dataframe filtering row like groupby

Answers (2)

Related Questions