Reputation: 329
I have a dataframe consists of transaction id, customer name and money spent, which looks like this:
id | name | amount
1 | Jennifer | 598
2 | Jennifer | 765
3 | Matt | 134
4 | George | 390
5 | Jennifer | 554
6 | Matt | 75
7 | Matt | 830
8 | Matt | 20
9 | Bob | 786
10 | Bob | 280
11 | Sam | 236
12 | Sam | 226
13 | Bob | 720
14 | Bob | 431
15 | Jennifer | 802
16 | Ann | 668
17 | Sam | 376
18 | Jennifer | 891
19 | Ann | 569
20 | Jennifer | 452
Now I want to make a new column called "amount1", which is the amount of money each custom spent last time he made a purchase, and the result will look like this:
id | name | amount | amount1
1 | Jennifer | 598 |
2 | Jennifer | 765 | 598
3 | Matt | 134 |
4 | George | 390 |
5 | Jennifer | 554 | 765
6 | Matt | 75 | 134
7 | Matt | 830 | 75
8 | Matt | 20 | 830
9 | Bob | 786 |
10 | Bob | 280 | 786
11 | Sam | 236 |
12 | Sam | 226 | 236
13 | Bob | 720 | 786
14 | Bob | 431 | 720
15 | Jennifer | 802 | 554
16 | Ann | 668 |
17 | Sam | 376 | 226
18 | Jennifer | 891 | 802
19 | Ann | 569 | 668
20 | Jennifer | 452 | 891
It is just iterating every row and search for all previous purchase record and update 'amount1' with the most recent purchase record.
I have tried with the code below, but i have about 200k rows of data, and it takes few hours to run. What is the most efficient way of doing this task?
df['amount1'] = np.nan
for index, row in df.iterrows():
purchase_id = row['id']
customer_name = row['name']
amt = df.query('id<@purchase_id and name==@customer')['amount'].values
if len(amt)>0:
df.loc[index,'amount1'] = amt[-1]
Upvotes: 2
Views: 181
Reputation: 862841
df['amount1'] = df.groupby(['name'])['amount'].shift()
print (df)
id name amount amount1
0 1 Jennifer 598 NaN
1 2 Jennifer 765 598.0
2 3 Matt 134 NaN
3 4 George 390 NaN
4 5 Jennifer 554 765.0
5 6 Matt 75 134.0
6 7 Matt 830 75.0
7 8 Matt 20 830.0
8 9 Bob 786 NaN
9 10 Bob 280 786.0
10 11 Sam 236 NaN
11 12 Sam 226 236.0
12 13 Bob 720 280.0
13 14 Bob 431 720.0
14 15 Jennifer 802 554.0
15 16 Ann 668 NaN
16 17 Sam 376 226.0
17 18 Jennifer 891 802.0
18 19 Ann 569 668.0
19 20 Jennifer 452 891.0
If need shift only positive amount
values use:
s = df['amount'].where(df['amount'] > 0)
df['amount1'] = s.groupby(df['name']).shift()
Upvotes: 1