Nick Bohl
Nick Bohl

Reputation: 125

Calculate Percent-Change (over time) of pandas column values based on other column value

I'm working with an example dataset:

       date      name     point
0   4/24/2019   Martha   3617138
1   4/25/2019   Martha   3961918
2   4/26/2019   Martha   4774966
3   4/27/2019   Martha   5217946
4   4/24/2019   Alex     62700321
5   4/25/2019   Alex     66721020
6   4/26/2019   Alex     71745138
7   4/27/2019   Alex     88762943
8   4/28/2019   Alex    102772578
9   4/29/2019   Alex    129089274
10  3/1/2019    Josh     1063259
11  3/3/2019    Josh     1063259
12  3/4/2019    Josh     1063259
13  3/5/2019    Josh     1063259
14  3/6/2019    Josh     1063259

and a list of name values

nameslist = ['Martha', 'Alex', 'Josh']

I want to calculate the percent change of all rows, based on the identifier in the name column.

expected output:

name    percent change
Martha      30.7
Alex        51.4
Josh          0

I initially tried to iterate through my list and table, and add all rows that match the list value, append a list with the calculate of change, then move the the next value of my list, but I can't articulate my code properly to make that happen.

df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by='date')

growthlist=[]
temptable=[]
for i in nameslist:
    for j in df:
        temptable.append(df[df['name'].str.match(nameslist[i])])
        length=[]
        growth=temptable[0]-temptable[length-1]
        growthlist.append(i,growth)

but that generates error:

TypeError: list indices must be integers or slices, not str

I also wouldn't mind using .groupby() and .pct_change() to accomplish this goal, but

growth = df.groupby('name').pct_change()

generates a long traceback that ends with:

TypeError: unsupported operand type(s) for /: 'str' and 'float'

ultimately, I would like to nest this within a function so I could use it on other datasets and be able to choose my column name (the actual datasets i'm working with are not standardized so the target column names are often different)

def calc_growth(dataset,colname):

but I'm not sure if that's too much too ask for this one question.

Unfortunately, I'm quite lost with this question so any help would be appreciated. I'm also wondering if transformation is an easier way to go with this, because at least i will always know the exact location of the two figures I need to calculate, but I don't even know how I would start something like that.

Thanks

Upvotes: 0

Views: 3390

Answers (2)

Ethels
Ethels

Reputation: 1

assuming there is a fourth column, maybe description as in,

       date      name     point      descr
0   4/24/2019   Martha   3617138      12g of ecg
1   4/25/2019   Martha   3961918      12g of eg
2   4/26/2019   Martha   4774966      43m of grams
3   4/27/2019   Martha   5217946      13cm of dose
4   4/24/2019   Alex     62700321     32m of grams
5   4/25/2019   Alex     66721020     12g of egc
6   4/26/2019   Alex     71745138      43m of grams
7   4/27/2019   Alex     88762943      30cm of dose
8   4/28/2019   Alex    102772578      12g of egc
9   4/29/2019   Alex    129089274      43m of grams
10  3/1/2019    Josh     1063259       13cm of dose
11  3/3/2019    Josh     1063259       12g of eg
12  3/4/2019    Josh     1063259       12g of eg
13  3/5/2019    Josh     1063259       43m of grams   
14  3/6/2019    Josh     1063259       43m of grams

can you re-write the code to

df.groupby('name',sort=False).orderby('descr').apply(lambda x: (x['point'].values[-1] - x['point'].values[0]) / x['point'].values[-1] * 100)\
    .reset_index(name='pct change')\.reset_index(name='descr')

or what would you think is the right approach to incorporate the description column?

Upvotes: 0

Erfan
Erfan

Reputation: 42896

You can use apply with last and first value approached through .values to calculate the percentage change over the whole group:

df.groupby('name',sort=False).apply(lambda x: (x['point'].values[-1] - x['point'].values[0]) / x['point'].values[-1] * 100)\
    .reset_index(name='pct change')

     name               pct change
0  Martha  30.67889165583545363347
1    Alex  51.42871358932579539669
2    Josh   0.00000000000000000000

Explanation

First we use groupby on name which will give us a group (read: a dataframe) based on each unique name:

for _, d in df.groupby('name', sort=False):
    print(d, '\n')

        date    name    point
0 2019-04-24  Martha  3617138
1 2019-04-25  Martha  3961918
2 2019-04-26  Martha  4774966
3 2019-04-27  Martha  5217946 

        date  name      point
4 2019-04-24  Alex   62700321
5 2019-04-25  Alex   66721020
6 2019-04-26  Alex   71745138
7 2019-04-27  Alex   88762943
8 2019-04-28  Alex  102772578
9 2019-04-29  Alex  129089274 

         date  name    point
10 2019-03-01  Josh  1063259
11 2019-03-03  Josh  1063259
12 2019-03-04  Josh  1063259
13 2019-03-05  Josh  1063259
14 2019-03-06  Josh  1063259 

Then we apply our own made lambda function to each seperate group and apply the following calculation:

percentage change = (point last value - point first value) / point last value * 100


Then we use reset_index to get our name column out of the index, since groupby puts it as index.

Upvotes: 1

Related Questions