crissal
crissal

Reputation: 2657

Keep first element for column in a DataFrame

I have a DataFrame like this.

>>> df = pd.DataFrame([[3., 0, 0], [0, 3., 0], [0, 0, 0], [0, 6., 6.], [1., 0, 0], [2., 5., 0]]).T
>>> df
     0    1    2    3    4    5
0  3.0  0.0  0.0  0.0  1.0  2.0
1  0.0  3.0  0.0  6.0  0.0  5.0
2  0.0  0.0  0.0  6.0  0.0  0.0

What I want to do is to keep the first element, column by column, replacing other non-zero values with a zero.

>>> expected
     0    1    2    3    4    5
0  3.0  0.0  0.0  0.0  1.0  2.0
1  0.0  3.0  0.0  6.0  0.0  0.0
2  0.0  0.0  0.0  0.0  0.0  0.0

My goal is to get a Series of the first elements, and I thought doing this via sum(), so I need zero values for other elements in column.

>>> expected.sum()
0    3.0
1    3.0
2    0.0
3    6.0
4    1.0
5    2.0
dtype: float64

Thank you very much in advance.

Upvotes: 4

Views: 505

Answers (3)

Umar.H
Umar.H

Reputation: 23099

Another way to first create your target dataframe using a boolean with mask, then sum and specify your axis.

df_new = df.mask(~df.ne(0).cumsum(0).cumsum(0).eq(1)).fillna(0)

     0    1    2    3    4    5
0  3.0  0.0  0.0  0.0  0.0  2.0
1  0.0  3.0  0.0  6.0  0.0  0.0
2  0.0  0.0  0.0  0.0  0.0  0.0

then

df_new.sum(0)

0    3.0
1    3.0
2    0.0
3    6.0
4    0.0
5    2.0
dtype: float64

Upvotes: 2

SpaceBurger
SpaceBurger

Reputation: 549

You could do something like:

import pandas as pd

# initialize table
df = pd.DataFrame([[3., 0, 0], [0, 3., 0], [0, 0, 0], [0, 6., 6.], [1., 0, 0], [2., 5., 0]]).T

# detect first non-zero value
# see https://stackoverflow.com/questions/50586146/find-first-non-zero-value-in-each-column-of-pandas-dataframe for details
non_zero_indexes = list(df.ne(0).idxmax()) # [0, 1, 0, 1, 0, 0]

for col_id in df.columns:
  if non_zero_indexes[col_id] != 0 and len(df) > 1:
    col_start = list(df[col_id][:non_zero_indexes[col_id]+1]) # e.g. [0.0, 6.0]
    col_end   = [0.0] * (len(df) - len(col_start)) # [0.0], i.e. fill with zeros
    df[col_id] = col_start + col_end # merge and get [0.0, 6.0, 0.0]

That way, you get the following output:

>>> df
     0    1    2    3    4    5
0  3.0  0.0  0.0  0.0  1.0  2.0
1  0.0  3.0  0.0  6.0  0.0  0.0
2  0.0  0.0  0.0  0.0  0.0  0.0

Upvotes: 0

Shubham Sharma
Shubham Sharma

Reputation: 71707

Mask the zero's then bfill and select the the first row using iloc

df[df != 0].bfill().iloc[0].fillna(0)

0    3.0
1    3.0
2    0.0
3    6.0
4    1.0
5    2.0
Name: 0, dtype: float64

Upvotes: 4

Related Questions