Reputation: 125
I'm working with an example dataset:
date name point
0 4/24/2019 Martha 3617138
1 4/25/2019 Martha 3961918
2 4/26/2019 Martha 4774966
3 4/27/2019 Martha 5217946
4 4/24/2019 Alex 62700321
5 4/25/2019 Alex 66721020
6 4/26/2019 Alex 71745138
7 4/27/2019 Alex 88762943
8 4/28/2019 Alex 102772578
9 4/29/2019 Alex 129089274
10 3/1/2019 Josh 1063259
11 3/3/2019 Josh 1063259
12 3/4/2019 Josh 1063259
13 3/5/2019 Josh 1063259
14 3/6/2019 Josh 1063259
and a list of name values
nameslist = ['Martha', 'Alex', 'Josh']
I want to calculate the percent change of all rows, based on the identifier in the name column.
expected output:
name percent change
Martha 30.7
Alex 51.4
Josh 0
I initially tried to iterate through my list and table, and add all rows that match the list value, append a list with the calculate of change, then move the the next value of my list, but I can't articulate my code properly to make that happen.
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by='date')
growthlist=[]
temptable=[]
for i in nameslist:
for j in df:
temptable.append(df[df['name'].str.match(nameslist[i])])
length=[]
growth=temptable[0]-temptable[length-1]
growthlist.append(i,growth)
but that generates error:
TypeError: list indices must be integers or slices, not str
I also wouldn't mind using .groupby() and .pct_change() to accomplish this goal, but
growth = df.groupby('name').pct_change()
generates a long traceback that ends with:
TypeError: unsupported operand type(s) for /: 'str' and 'float'
ultimately, I would like to nest this within a function so I could use it on other datasets and be able to choose my column name (the actual datasets i'm working with are not standardized so the target column names are often different)
def calc_growth(dataset,colname):
but I'm not sure if that's too much too ask for this one question.
Unfortunately, I'm quite lost with this question so any help would be appreciated. I'm also wondering if transformation is an easier way to go with this, because at least i will always know the exact location of the two figures I need to calculate, but I don't even know how I would start something like that.
Thanks
Upvotes: 0
Views: 3390
Reputation: 1
assuming there is a fourth column, maybe description as in,
date name point descr
0 4/24/2019 Martha 3617138 12g of ecg
1 4/25/2019 Martha 3961918 12g of eg
2 4/26/2019 Martha 4774966 43m of grams
3 4/27/2019 Martha 5217946 13cm of dose
4 4/24/2019 Alex 62700321 32m of grams
5 4/25/2019 Alex 66721020 12g of egc
6 4/26/2019 Alex 71745138 43m of grams
7 4/27/2019 Alex 88762943 30cm of dose
8 4/28/2019 Alex 102772578 12g of egc
9 4/29/2019 Alex 129089274 43m of grams
10 3/1/2019 Josh 1063259 13cm of dose
11 3/3/2019 Josh 1063259 12g of eg
12 3/4/2019 Josh 1063259 12g of eg
13 3/5/2019 Josh 1063259 43m of grams
14 3/6/2019 Josh 1063259 43m of grams
can you re-write the code to
df.groupby('name',sort=False).orderby('descr').apply(lambda x: (x['point'].values[-1] - x['point'].values[0]) / x['point'].values[-1] * 100)\
.reset_index(name='pct change')\.reset_index(name='descr')
or what would you think is the right approach to incorporate the description column?
Upvotes: 0
Reputation: 42896
You can use apply
with last
and first
value approached through .values
to calculate the percentage change over the whole group:
df.groupby('name',sort=False).apply(lambda x: (x['point'].values[-1] - x['point'].values[0]) / x['point'].values[-1] * 100)\
.reset_index(name='pct change')
name pct change
0 Martha 30.67889165583545363347
1 Alex 51.42871358932579539669
2 Josh 0.00000000000000000000
First we use groupby on name
which will give us a group (read: a dataframe) based on each unique name:
for _, d in df.groupby('name', sort=False):
print(d, '\n')
date name point
0 2019-04-24 Martha 3617138
1 2019-04-25 Martha 3961918
2 2019-04-26 Martha 4774966
3 2019-04-27 Martha 5217946
date name point
4 2019-04-24 Alex 62700321
5 2019-04-25 Alex 66721020
6 2019-04-26 Alex 71745138
7 2019-04-27 Alex 88762943
8 2019-04-28 Alex 102772578
9 2019-04-29 Alex 129089274
date name point
10 2019-03-01 Josh 1063259
11 2019-03-03 Josh 1063259
12 2019-03-04 Josh 1063259
13 2019-03-05 Josh 1063259
14 2019-03-06 Josh 1063259
Then we apply our own made lambda
function to each seperate group and apply the following calculation:
percentage change = (point last value - point first value) / point last value * 100
Then we use reset_index
to get our name
column out of the index, since groupby
puts it as index.
Upvotes: 1