Reputation: 511
I have a dataframe like below:
name | date | col1 | col2 |
---|---|---|---|
A | 2021-03-01 | 0 | 1 |
A | 2021-03-02 | 0 | 0 |
A | 2021-03-03 | 3 | 1 |
A | 2021-03-04 | 1 | 0 |
A | 2021-03-05 | 3 | 1 |
A | 2021-03-06 | 1 | 0 |
B | 2021-03-01 | 1 | 0 |
B | 2021-03-02 | 2 | 0 |
B | 2021-03-03 | 3 | 1 |
B | 2021-03-04 | 0 | 1 |
B | 2021-03-05 | 0 | 0 |
B | 2021-03-06 | 0 | 0 |
I'd like to group by the names and find the number of days spanned by the nonzero entries of the other non-date columns (basically excluding any leading or trailing zeroes) to get something like:
name | col1 | col2 |
---|---|---|
A | 4 | 5 |
B | 3 | 2 |
How can I do this without resorting to a for loop?
Upvotes: 4
Views: 207
Reputation: 59549
Here's a method that might scale a lot better for many groups [it avoids the apply(lambda x:)
in favor of multiple groubpy builtin operations.]
Check for values !=0 then take a cummax
within group in both directions to label the spans with True. Then groupby
+ sum
to get the size.
cols = ['col1', 'col2']
df1 = df[cols].ne(0)
df1 = df1.groupby(df['name']).cummax() & df1[::-1].groupby(df['name']).cummax()
df1 = df1.groupby(df['name']).sum()
print(df1)
col1 col2
name
A 4 5
B 3 2
Upvotes: 0
Reputation: 195448
Another solution:
def fn(x):
i = x[(x != 0)].index
return i[-1] - i[0] + 1 if len(i) > 0 else 0
print(df.groupby("name").agg({"col1": fn, "col2": fn}))
Prints:
col1 col2
name
A 4 5
B 3 2
Upvotes: 1
Reputation: 1677
I think, np.trim_zeros
is what you are looking for:
>>> import numpy as np; import pandas as pd
>>> df = pd.DataFrame.from_dict({'name': ['A']*6 + ['B']*6, 'col1': [0, 0, 3, 1, 3, 1, 1, 2, 3, 0, 0, 0], 'col2': [1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0]})
>>> df
name col1 col2
0 A 0 1
1 A 0 0
2 A 3 1
3 A 1 0
4 A 3 1
5 A 1 0
6 B 1 0
7 B 2 0
8 B 3 1
9 B 0 1
10 B 0 0
11 B 0 0
>>> df.groupby('name').aggregate(lambda x: len(np.trim_zeros(x))).reset_index()
name col1 col2
0 A 4 5
1 B 3 2
Upvotes: 4