Reputation: 1180
I am trying to calculate percentages of first down from a dataframe.
Here is the dataframe
down distance
1 1.0 10.0
2 2.0 13.0
3 3.0 15.0
4 3.0 20.0
5 4.0 1.0
6 1.0 10.0
7 2.0 9.0
8 3.0 3.0
9 1.0 10.0
I would like to calculate the percent from first down, meaning for second down, what is the percent of yards gained. For third down, perc of third based on first.
For example, I would like to have the following output.
down distance percentage
1 1.0 10.0 NaN
2 2.0 13.0 (13-10)/13
3 3.0 15.0 (15-10)/15
4 3.0 20.0 (20-10)/20
5 4.0 1.0 (1-10)/20
6 1.0 10.0 NaN # New calculation
7 2.0 9.0 (9-10)/9
8 3.0 3.0 (3-10)/3
9 1.0 10.0 NaN
Thanks
Current solutions all work correctly for the first question.
Upvotes: 2
Views: 427
Reputation: 294488
Should be pretty zippy!
m = df.down.values == 1 # mask where equal to 1
i = np.flatnonzero(m) # positions where equal to 1
d = df.distance.values # Numpy array of distances
j = np.diff(np.append(i, len(df))) # use diff to find distances between
# values equal to 1. Note that I append
# the length of the df as a terminal value
k = i.repeat(j) # I repeat the positions where equal to 1
# a number of times in order to fill in.
p = np.where(m, np.nan, 1 - d[k] / d) # reduction of % formula while masking
df.assign(percentage=p)
down distance percentage
1 1.0 10.0 NaN
2 2.0 13.0 0.230769
3 3.0 15.0 0.333333
4 3.0 20.0 0.500000
5 4.0 1.0 -9.000000
6 1.0 10.0 NaN
7 2.0 9.0 -0.111111
8 3.0 3.0 -2.333333
9 1.0 10.0 NaN
Upvotes: 2
Reputation: 51165
Using groupby
and transform
:
s = df.groupby(df.down.eq(1).cumsum()).distance.transform('first')
s = df.distance.sub(s).div(df.distance)
df['percentage'] = s.mask(s.eq(0))
down distance percentage
1 1.0 10.0 NaN
2 2.0 13.0 0.230769
3 3.0 15.0 0.333333
4 3.0 20.0 0.500000
5 4.0 1.0 -9.000000
6 1.0 10.0 NaN
7 2.0 9.0 -0.111111
8 3.0 3.0 -2.333333
9 1.0 10.0 NaN
Upvotes: 2
Reputation: 164773
Here's a vectorised solution:
# define condition
cond = df['down'] == 1
# calculate value to subtract
first = df['distance'].where(cond).ffill().mask(cond)
# perform calculation
df['percentage'] = (df['distance'] - first) / df['distance']
print(df)
down distance percentage
1 1.0 10.0 NaN
2 2.0 13.0 0.230769
3 3.0 15.0 0.333333
4 3.0 20.0 0.500000
5 4.0 1.0 -9.000000
6 1.0 10.0 NaN
7 2.0 9.0 -0.111111
8 3.0 3.0 -2.333333
9 1.0 10.0 NaN
Upvotes: 5
Reputation: 51395
use groupby to group by each time down
is equal to 1, than transform with your desired calculation. Then you can find where down
is 1 again, and convert to NaN
(as the calculation is meaningless there, as per your example):
df['percentage'] = (df.groupby(df.down.eq(1).cumsum())['distance']
.transform(lambda x: (x-x.iloc[0])/x))
df.loc[df.down.eq(1),'percentage'] = np.nan
>>> df
down distance percentage
1 1.0 10.0 NaN
2 2.0 13.0 0.230769
3 3.0 15.0 0.333333
4 3.0 20.0 0.500000
5 4.0 1.0 -9.000000
6 1.0 10.0 NaN
7 2.0 9.0 -0.111111
8 3.0 3.0 -2.333333
9 1.0 10.0 NaN
Upvotes: 1