crystallikelaw
crystallikelaw

Reputation: 15

In pandas, how to operate on the row with the first instance of a string?

I have a csv file, and I'm trying to convert a column with cumulative values to individual values. I can form most of the column with

df['delta'] = df['expenditure'].diff()

So for each person (A,B..) I want the change in expenditure since they last attended. What which gives me

person days  expenditure delta
A        1       10       
A        2       24       14
A        10      45       21
B        2        0      -45
B        7        2        2
B        8       10        8
C        5       50       40
C        6       78       28
C        7       90       12

and what I want is

person days  expenditure delta
A        1       10 --->  10
A        2       24       14
A        10      45       21
B        2        0 --->   0
B        7        2        2
B        8       10        8
C        5       50 --->  50
C        6       78       28
C        7       90       12

so for each person, I want their lowest day's expenditure value put in delta.

Additionally, if I'm trying to average delta by the days, how would I go about it? That is if I wanted

person days  expenditure delta
A        1       10       10
A        2       24       14
A        10      45       21/8
B        2        0        0
B        7        2        2/5
B        8       10        8

So 21/8 is the (change in expenditure)/(change in days) for A

Upvotes: 1

Views: 44

Answers (1)

jezrael
jezrael

Reputation: 862661

Use DataFrameGroupBy.diff with replace first missing values by original by Series.fillna:

df['delta'] = df.groupby('person')['expenditure'].diff().fillna(df['expenditure'])
print (df)
  person  days  expenditure  delta
0      A     1           10   10.0
1      A     2           24   14.0
2      A    10           45   21.0
3      B     2            0    0.0
4      B     7            2    2.0
5      B     8           10    8.0
6      C     5           50   50.0
7      C     6           78   28.0
8      C     7           90   12.0

And for second is possible processing both columns and then divide in DataFrame.eval:

df['delta'] = (df.groupby('person')[['expenditure', 'days']].diff()
                 .fillna(df[['expenditure','days']])
                 .eval('expenditure / days'))

What working same like:

df['delta'] = (df.groupby('person')['expenditure'].diff().fillna(df['expenditure'])
                .div(df.groupby('person')['days'].diff().fillna(df['days'])))

print (df)
  person  days  expenditure   delta
0      A     1           10  10.000
1      A     2           24  14.000
2      A    10           45   2.625
3      B     2            0   0.000
4      B     7            2   0.400
5      B     8           10   8.000
6      C     5           50  10.000
7      C     6           78  28.000
8      C     7           90  12.000

Upvotes: 1

Related Questions