Justin Fay
Justin Fay

Reputation: 2606

Selecting the min from groups in a pandas series

I have a pandas Series that looks like this

>>> print(x)
0     1
1     2
2     3
3     4
4     0
5     0
6     0
7     0
8     9
9     6
10    3
11    5
12    7
Name: c, dtype: int64

I want to find the minimum value from each group of numbers that are not zero, I may not be explaining this great so I would like the output to look like this

>>> print(result)
0     1
1     1
2     1
3     1
4     0
5     0
6     0
7     0
8     3
9     3
10    3
11    3
12    3
Name: c, dtype: int64

Upvotes: 2

Views: 52

Answers (2)

piRSquared
piRSquared

Reputation: 294258

for and Numba

I want to use a for loop but speed it up with Numba

  • Yes: this is a for loop and not very pretty
  • No: it is not slow because I use Numba (-:

Imports

import pandas as pd
import numpy as np
from numba import njit

Define Function

@njit
def f(x):
    y = []
    z = []
    for a in x:
        if not y:
            y.append(a)
            z.append(0)
        else:
            if (y[-1] == 0) ^ (a == 0):
                y.append(a)
                z.append(z[-1] + 1)
            else:
                y[-1] = min(y[-1], a)
                z.append(z[-1])
    return np.array(y)[np.array(z)]

Use Function

pd.Series(f(x.to_numpy()), x.index)

0     1
1     1
2     1
3     1
4     0
5     0
6     0
7     0
8     3
9     3
10    3
11    3
12    3
dtype: int64

itertools.groupby

Credit to room 6 for the assist.

from itertools import groupby, repeat

def repeat_min(x):
    for _, group in groupby(x, key=bool):
        group = list(group)
        minval = min(group)
        yield from repeat(minval, len(group))

pd.Series([*repeat_min(x)], x.index)

0     1
1     1
2     1
3     1
4     0
5     0
6     0
7     0
8     3
9     3
10    3
11    3
12    3
dtype: int64

Upvotes: 3

cs95
cs95

Reputation: 402493

Use the shifting cumsum trick, then call GroupBy.transform:

u = x.eq(0)
x.groupby(u.ne(u.shift()).cumsum()).transform('min')

0     1
1     1
2     1
3     1
4     0
5     0
6     0
7     0
8     3
9     3
10    3
11    3
12    3
Name: 1, dtype: int64

Upvotes: 3

Related Questions