VincFort
VincFort

Reputation: 1180

Pandas shift based on different values to calculate percentages

I am trying to calculate percentages of first down from a dataframe.

Here is the dataframe

        down  distance
1        1.0      10.0
2        2.0      13.0
3        3.0      15.0
4        3.0      20.0
5        4.0       1.0
6        1.0      10.0
7        2.0       9.0
8        3.0       3.0
9        1.0      10.0

I would like to calculate the percent from first down, meaning for second down, what is the percent of yards gained. For third down, perc of third based on first.

For example, I would like to have the following output.

        down  distance    percentage

1        1.0      10.0    NaN
2        2.0      13.0    (13-10)/13
3        3.0      15.0    (15-10)/15
4        3.0      20.0    (20-10)/20
5        4.0       1.0    (1-10)/20
6        1.0      10.0    NaN       # New calculation
7        2.0       9.0    (9-10)/9
8        3.0       3.0    (3-10)/3
9        1.0      10.0    NaN

Thanks

Current solutions all work correctly for the first question.

Upvotes: 2

Views: 427

Answers (4)

piRSquared
piRSquared

Reputation: 294488

With Numpy Bits

Should be pretty zippy!

m = df.down.values == 1                # mask where equal to 1
i = np.flatnonzero(m)                  # positions where equal to 1
d = df.distance.values                 # Numpy array of distances

j = np.diff(np.append(i, len(df)))     # use diff to find distances between
                                       # values equal to 1.  Note that I append
                                       # the length of the df as a terminal value

k = i.repeat(j)                        # I repeat the positions where equal to 1
                                       # a number of times in order to fill in.
p = np.where(m, np.nan, 1 - d[k] / d)  # reduction of % formula while masking

df.assign(percentage=p)

   down  distance  percentage
1   1.0      10.0         NaN
2   2.0      13.0    0.230769
3   3.0      15.0    0.333333
4   3.0      20.0    0.500000
5   4.0       1.0   -9.000000
6   1.0      10.0         NaN
7   2.0       9.0   -0.111111
8   3.0       3.0   -2.333333
9   1.0      10.0         NaN

Upvotes: 2

user3483203
user3483203

Reputation: 51165

Using groupby and transform:

s = df.groupby(df.down.eq(1).cumsum()).distance.transform('first')
s = df.distance.sub(s).div(df.distance)
df['percentage'] = s.mask(s.eq(0))

   down  distance  percentage
1   1.0      10.0         NaN
2   2.0      13.0    0.230769
3   3.0      15.0    0.333333
4   3.0      20.0    0.500000
5   4.0       1.0   -9.000000
6   1.0      10.0         NaN
7   2.0       9.0   -0.111111
8   3.0       3.0   -2.333333
9   1.0      10.0         NaN

Upvotes: 2

jpp
jpp

Reputation: 164773

Here's a vectorised solution:

# define condition
cond = df['down'] == 1

# calculate value to subtract
first = df['distance'].where(cond).ffill().mask(cond)

# perform calculation
df['percentage'] = (df['distance'] - first) / df['distance']

print(df)

   down  distance  percentage
1   1.0      10.0         NaN
2   2.0      13.0    0.230769
3   3.0      15.0    0.333333
4   3.0      20.0    0.500000
5   4.0       1.0   -9.000000
6   1.0      10.0         NaN
7   2.0       9.0   -0.111111
8   3.0       3.0   -2.333333
9   1.0      10.0         NaN

Upvotes: 5

sacuL
sacuL

Reputation: 51395

use groupby to group by each time down is equal to 1, than transform with your desired calculation. Then you can find where down is 1 again, and convert to NaN (as the calculation is meaningless there, as per your example):

df['percentage'] = (df.groupby(df.down.eq(1).cumsum())['distance']
                .transform(lambda x: (x-x.iloc[0])/x))


df.loc[df.down.eq(1),'percentage'] = np.nan

>>> df
   down  distance  percentage
1   1.0      10.0         NaN
2   2.0      13.0    0.230769
3   3.0      15.0    0.333333
4   3.0      20.0    0.500000
5   4.0       1.0   -9.000000
6   1.0      10.0         NaN
7   2.0       9.0   -0.111111
8   3.0       3.0   -2.333333
9   1.0      10.0         NaN

Upvotes: 1

Related Questions