Reputation: 159
I have a dataframe with 5 columns indexed to date. the data is consisted of 0's and 1's. I would like to find the first nonzeros in each series and multiple it (and only it) by 100.
Date A B C D E
3/1/16 0 0 0 0 0
3/2/16 0 0 1 0 0
3/3/16 1 0 0 0 0
3/4/16 0 1 0 0 0
3/7/16 0 0 1 0 1
3/8/16 0 0 0 0 1
3/9/16 0 0 0 1 1
I tried the following code, but it didn't work.
for col in df.columns:
idx = df[col].first_valid_index()
df[col][idx] = df[col][idx]*100.
Upvotes: 2
Views: 79
Reputation: 863166
Use set_index
first for all numeric columns and then create boolean mask with eq
(same as ==
) with chain by cumsum
and compare too.
Then select by mask and multiple, last reset_index
:
df = df.set_index('Date')
m = df.eq(1) & df.cumsum().eq(1)
df[m] *= 100
df = df.reset_index()
print (df)
Date A B C D E
0 3/1/16 0 0 0 0 0
1 3/2/16 0 0 100 0 0
2 3/3/16 100 0 0 0 0
3 3/4/16 0 100 0 0 0
4 3/7/16 0 0 1 0 100
5 3/8/16 0 0 0 0 1
6 3/9/16 0 0 0 100 1
Details:
print (df.cumsum())
A B C D E
Date
3/1/16 0 0 0 0 0
3/2/16 0 0 1 0 0
3/3/16 1 0 1 0 0
3/4/16 1 1 1 0 0
3/7/16 1 1 2 0 1
3/8/16 1 1 2 0 2
3/9/16 1 1 2 1 3
print (df.cumsum().eq(1))
A B C D E
Date
3/1/16 False False False False False
3/2/16 False False True False False
3/3/16 True False True False False
3/4/16 True True True False False
3/7/16 True True False False True
3/8/16 True True False False False
3/9/16 True True False True False
print (df.eq(1))
A B C D E
Date
3/1/16 False False False False False
3/2/16 False False True False False
3/3/16 True False False False False
3/4/16 False True False False False
3/7/16 False False True False True
3/8/16 False False False False True
3/9/16 False False False True True
m = df.eq(1) & df.cumsum(axis=1).eq(1)
print (m)
A B C D E
Date
3/1/16 False False False False False
3/2/16 False False True False False
3/3/16 True False False False False
3/4/16 False True False False False
3/7/16 False False True False False
3/8/16 False False False False True
3/9/16 False False False True False
Setup:
from pandas.compat import StringIO
temp=u"""Date A B C D E
3/1/16 0 0 0 0 0
3/2/16 0 0 1 0 0
3/3/16 1 0 0 0 0
3/4/16 0 1 0 0 0
3/7/16 0 0 1 0 1
3/8/16 0 0 0 0 1
3/9/16 0 0 0 1 1"""
df = pd.read_csv(StringIO(temp), sep="\s+")
print (df)
Date A B C D E
0 3/1/16 0 0 0 0 0
1 3/2/16 0 0 1 0 0
2 3/3/16 1 0 0 0 0
3 3/4/16 0 1 0 0 0
4 3/7/16 0 0 1 0 1
5 3/8/16 0 0 0 0 1
6 3/9/16 0 0 0 1 1
Upvotes: 5
Reputation: 402814
I knew there was a way to use argmax
here.
df = df.set_index('Date')
v = df.values
v[v.argmax(0), np.arange(df.shape[1] - 1)] *= 100
df[:] = v
df.reset_index()
Date A B C D E
0 3/1/16 0 0 0 0 0
1 3/2/16 0 0 100 0 0
2 3/3/16 100 0 0 0 0
3 3/4/16 0 100 0 0 0
4 3/7/16 0 0 100 0 1
5 3/8/16 0 0 0 0 100
6 3/9/16 0 0 0 100 1
Had a little help from here.
Upvotes: 3
Reputation: 887501
With the for
loop, we can do
cols = df.columns[df.columns != 'Date']
for col in cols:
idx = df[col][df[col] != 0].index[0]
df[col][idx] = df[col][idx]*100
Upvotes: 2