helloworldlevel
helloworldlevel

Reputation: 159

Python: how to find the first nonzero in a multi-series dataframe?

I have a dataframe with 5 columns indexed to date. the data is consisted of 0's and 1's. I would like to find the first nonzeros in each series and multiple it (and only it) by 100.

Date    A   B   C   D   E
3/1/16  0   0   0   0   0
3/2/16  0   0   1   0   0
3/3/16  1   0   0   0   0
3/4/16  0   1   0   0   0
3/7/16  0   0   1   0   1
3/8/16  0   0   0   0   1
3/9/16  0   0   0   1   1

I tried the following code, but it didn't work.

for col in df.columns:
    idx = df[col].first_valid_index()
    df[col][idx] = df[col][idx]*100.

Upvotes: 2

Views: 79

Answers (3)

jezrael
jezrael

Reputation: 863166

Use set_index first for all numeric columns and then create boolean mask with eq (same as ==) with chain by cumsum and compare too.

Then select by mask and multiple, last reset_index:

df = df.set_index('Date')

m = df.eq(1) & df.cumsum().eq(1)
df[m] *= 100

df = df.reset_index()
print (df)
     Date    A    B    C    D    E
0  3/1/16    0    0    0    0    0
1  3/2/16    0    0  100    0    0
2  3/3/16  100    0    0    0    0
3  3/4/16    0  100    0    0    0
4  3/7/16    0    0    1    0  100
5  3/8/16    0    0    0    0    1
6  3/9/16    0    0    0  100    1

Details:

print (df.cumsum())
        A  B  C  D  E
Date                 
3/1/16  0  0  0  0  0
3/2/16  0  0  1  0  0
3/3/16  1  0  1  0  0
3/4/16  1  1  1  0  0
3/7/16  1  1  2  0  1
3/8/16  1  1  2  0  2
3/9/16  1  1  2  1  3


print (df.cumsum().eq(1))
            A      B      C      D      E
Date                                     
3/1/16  False  False  False  False  False
3/2/16  False  False   True  False  False
3/3/16   True  False   True  False  False
3/4/16   True   True   True  False  False
3/7/16   True   True  False  False   True
3/8/16   True   True  False  False  False
3/9/16   True   True  False   True  False

print (df.eq(1))
            A      B      C      D      E
Date                                     
3/1/16  False  False  False  False  False
3/2/16  False  False   True  False  False
3/3/16   True  False  False  False  False
3/4/16  False   True  False  False  False
3/7/16  False  False   True  False   True
3/8/16  False  False  False  False   True
3/9/16  False  False  False   True   True

m = df.eq(1) & df.cumsum(axis=1).eq(1)
print (m)
            A      B      C      D      E
Date                                     
3/1/16  False  False  False  False  False
3/2/16  False  False   True  False  False
3/3/16   True  False  False  False  False
3/4/16  False   True  False  False  False
3/7/16  False  False   True  False  False
3/8/16  False  False  False  False   True
3/9/16  False  False  False   True  False

Setup:

from pandas.compat import StringIO

temp=u"""Date   A   B   C   D   E
3/1/16  0   0   0   0   0
3/2/16  0   0   1   0   0
3/3/16  1   0   0   0   0
3/4/16  0   1   0   0   0
3/7/16  0   0   1   0   1
3/8/16  0   0   0   0   1
3/9/16  0   0   0   1   1"""
df = pd.read_csv(StringIO(temp), sep="\s+")
print (df)

     Date  A  B  C  D  E
0  3/1/16  0  0  0  0  0
1  3/2/16  0  0  1  0  0
2  3/3/16  1  0  0  0  0
3  3/4/16  0  1  0  0  0
4  3/7/16  0  0  1  0  1
5  3/8/16  0  0  0  0  1
6  3/9/16  0  0  0  1  1

Upvotes: 5

cs95
cs95

Reputation: 402814

I knew there was a way to use argmax here.

df = df.set_index('Date')

v = df.values
v[v.argmax(0), np.arange(df.shape[1] - 1)] *= 100  
df[:] = v

df.reset_index()



     Date    A    B    C    D    E
0  3/1/16    0    0    0    0    0
1  3/2/16    0    0  100    0    0
2  3/3/16  100    0    0    0    0
3  3/4/16    0  100    0    0    0
4  3/7/16    0    0  100    0    1
5  3/8/16    0    0    0    0  100
6  3/9/16    0    0    0  100    1

Had a little help from here.

Upvotes: 3

akrun
akrun

Reputation: 887501

With the for loop, we can do

cols = df.columns[df.columns != 'Date']
for col in cols:
    idx = df[col][df[col] != 0].index[0]
    df[col][idx] = df[col][idx]*100

Upvotes: 2

Related Questions