Reputation: 2657
I have a DataFrame like this.
>>> df = pd.DataFrame([[3., 0, 0], [0, 3., 0], [0, 0, 0], [0, 6., 6.], [1., 0, 0], [2., 5., 0]]).T
>>> df
0 1 2 3 4 5
0 3.0 0.0 0.0 0.0 1.0 2.0
1 0.0 3.0 0.0 6.0 0.0 5.0
2 0.0 0.0 0.0 6.0 0.0 0.0
What I want to do is to keep the first element, column by column, replacing other non-zero values with a zero.
>>> expected
0 1 2 3 4 5
0 3.0 0.0 0.0 0.0 1.0 2.0
1 0.0 3.0 0.0 6.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0
My goal is to get a Series of the first elements, and I thought doing this via sum()
, so I need zero values for other elements in column.
>>> expected.sum()
0 3.0
1 3.0
2 0.0
3 6.0
4 1.0
5 2.0
dtype: float64
Thank you very much in advance.
Upvotes: 4
Views: 505
Reputation: 23099
Another way to first create your target dataframe using a boolean with mask
, then sum and specify your axis.
df_new = df.mask(~df.ne(0).cumsum(0).cumsum(0).eq(1)).fillna(0)
0 1 2 3 4 5
0 3.0 0.0 0.0 0.0 0.0 2.0
1 0.0 3.0 0.0 6.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0
then
df_new.sum(0)
0 3.0
1 3.0
2 0.0
3 6.0
4 0.0
5 2.0
dtype: float64
Upvotes: 2
Reputation: 549
You could do something like:
import pandas as pd
# initialize table
df = pd.DataFrame([[3., 0, 0], [0, 3., 0], [0, 0, 0], [0, 6., 6.], [1., 0, 0], [2., 5., 0]]).T
# detect first non-zero value
# see https://stackoverflow.com/questions/50586146/find-first-non-zero-value-in-each-column-of-pandas-dataframe for details
non_zero_indexes = list(df.ne(0).idxmax()) # [0, 1, 0, 1, 0, 0]
for col_id in df.columns:
if non_zero_indexes[col_id] != 0 and len(df) > 1:
col_start = list(df[col_id][:non_zero_indexes[col_id]+1]) # e.g. [0.0, 6.0]
col_end = [0.0] * (len(df) - len(col_start)) # [0.0], i.e. fill with zeros
df[col_id] = col_start + col_end # merge and get [0.0, 6.0, 0.0]
That way, you get the following output:
>>> df
0 1 2 3 4 5
0 3.0 0.0 0.0 0.0 1.0 2.0
1 0.0 3.0 0.0 6.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0
Upvotes: 0
Reputation: 71707
Mask the zero's then bfill
and select the the first row using iloc
df[df != 0].bfill().iloc[0].fillna(0)
0 3.0
1 3.0
2 0.0
3 6.0
4 1.0
5 2.0
Name: 0, dtype: float64
Upvotes: 4