Calculate Percent-Change (over time) of pandas column values based on other column value

Question

I'm working with an example dataset:

       date      name     point
0   4/24/2019   Martha   3617138
1   4/25/2019   Martha   3961918
2   4/26/2019   Martha   4774966
3   4/27/2019   Martha   5217946
4   4/24/2019   Alex     62700321
5   4/25/2019   Alex     66721020
6   4/26/2019   Alex     71745138
7   4/27/2019   Alex     88762943
8   4/28/2019   Alex    102772578
9   4/29/2019   Alex    129089274
10  3/1/2019    Josh     1063259
11  3/3/2019    Josh     1063259
12  3/4/2019    Josh     1063259
13  3/5/2019    Josh     1063259
14  3/6/2019    Josh     1063259

and a list of name values

nameslist = ['Martha', 'Alex', 'Josh']

I want to calculate the percent change of all rows, based on the identifier in the name column.

expected output:

name    percent change
Martha      30.7
Alex        51.4
Josh          0

I initially tried to iterate through my list and table, and add all rows that match the list value, append a list with the calculate of change, then move the the next value of my list, but I can't articulate my code properly to make that happen.

df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by='date')

growthlist=[]
temptable=[]
for i in nameslist:
    for j in df:
        temptable.append(df[df['name'].str.match(nameslist[i])])
        length=[]
        growth=temptable[0]-temptable[length-1]
        growthlist.append(i,growth)

but that generates error:

TypeError: list indices must be integers or slices, not str

I also wouldn't mind using .groupby() and .pct_change() to accomplish this goal, but

growth = df.groupby('name').pct_change()

generates a long traceback that ends with:

TypeError: unsupported operand type(s) for /: 'str' and 'float'

ultimately, I would like to nest this within a function so I could use it on other datasets and be able to choose my column name (the actual datasets i'm working with are not standardized so the target column names are often different)

def calc_growth(dataset,colname):

but I'm not sure if that's too much too ask for this one question.

Unfortunately, I'm quite lost with this question so any help would be appreciated. I'm also wondering if transformation is an easier way to go with this, because at least i will always know the exact location of the two figures I need to calculate, but I don't even know how I would start something like that.

Thanks

Erfan · Accepted Answer

You can use apply with last and first value approached through .values to calculate the percentage change over the whole group:

df.groupby('name',sort=False).apply(lambda x: (x['point'].values[-1] - x['point'].values[0]) / x['point'].values[-1] * 100)\
    .reset_index(name='pct change')

     name               pct change
0  Martha  30.67889165583545363347
1    Alex  51.42871358932579539669
2    Josh   0.00000000000000000000

Explanation

First we use groupby on name which will give us a group (read: a dataframe) based on each unique name:

for _, d in df.groupby('name', sort=False):
    print(d, '
')

        date    name    point
0 2019-04-24  Martha  3617138
1 2019-04-25  Martha  3961918
2 2019-04-26  Martha  4774966
3 2019-04-27  Martha  5217946 

        date  name      point
4 2019-04-24  Alex   62700321
5 2019-04-25  Alex   66721020
6 2019-04-26  Alex   71745138
7 2019-04-27  Alex   88762943
8 2019-04-28  Alex  102772578
9 2019-04-29  Alex  129089274 

         date  name    point
10 2019-03-01  Josh  1063259
11 2019-03-03  Josh  1063259
12 2019-03-04  Josh  1063259
13 2019-03-05  Josh  1063259
14 2019-03-06  Josh  1063259

Then we apply our own made lambda function to each seperate group and apply the following calculation:

percentage change = (point last value - point first value) / point last value * 100

Then we use reset_index to get our name column out of the index, since groupby puts it as index.

Calculate Percent-Change (over time) of pandas column values based on other column value

Answers (2)

Explanation

Related Questions