Reputation: 159
I have a data set with dates, customers and income:
Date CustomerIncome
0 1/1/2018 A 53
1 2/1/2018 A 36
2 3/1/2018 A 53
3 5/1/2018 A 89
4 6/1/2018 A 84
5 8/1/2018 A 84
6 9/1/2018 A 54
7 10/1/2018 A 19
8 11/1/2018 A 44
9 12/1/2018 A 80
10 1/1/2018 B 24
11 2/1/2018 B 100
12 9/1/2018 B 40
13 10/1/2018 B 47
14 12/1/2018 B 10
15 2/1/2019 B 5
For both customers there are missing dates as they purchased nothing at some months. I want to add per each customer what was the income of last month and also the rolling sum of income for the last year. Meaning, if there's a missing month, I'll see '0' at the shift(1) column of the following month that has income. And I'll see rolling sum of 12 months even if there weren't 12 observations.
This is the expected result:
Date CustomerIncome S(1)R(12)
0 1/1/2018 A 53 0 53
1 2/1/2018 A 36 53 89
2 3/1/2018 A 53 36 142
3 5/1/2018 A 89 0 231
4 6/1/2018 A 84 89 315
5 8/1/2018 A 84 0 399
6 9/1/2018 A 54 84 453
7 10/1/2018 A 19 54 472
8 11/1/2018 A 44 19 516
9 12/1/2018 A 80 44 596
10 1/1/2018 B 24 0 24
11 2/1/2018 B 100 24 124
12 9/1/2018 B 40 0 164
13 10/1/2018 B 47 40 211
14 12/1/2018 B 10 0 221
15 2/1/2019 B 5 0 102
So far, I've added the rows with missing dates with stack and unstack, but with multiple dates and customers, it explodes the data to millions of rows, crashing kernel with most rows are 0's.
Upvotes: 2
Views: 1063
Reputation: 16683
.shift
but have logic that if the gap is > 31 days, then make (S1)
= 0
df['Date'] = pd.to_datetime(df['Date']).dt.date
df['S(1)'] = df.groupby('Customer')['Income'].transform('shift').fillna(0)
s = (df['Date'] - df['Date'].shift())/np.timedelta64(1, '31D') <= 1
df['S(1)'] = df['S(1)'].where(s,0).astype(int)
df['Rolling Date'] = (df['Date'] - pd.Timedelta('1Y'))
df['R(12)'] = df.apply(lambda d: sum([z for x,y,z in
zip(df['Customer'], df['Date'], df['Income'])
if y > d['Rolling Date']
if y <= d['Date']
if x == d['Customer']]), axis=1)
df = df.drop('Rolling Date', axis=1)
df
Out[1]:
Date Customer Income S(1) R(12)
0 2018-01-01 A 53 0 53
1 2018-02-01 A 36 53 89
2 2018-03-01 A 53 36 142
3 2018-05-01 A 89 0 231
4 2018-06-01 A 84 89 315
5 2018-08-01 A 84 0 399
6 2018-09-01 A 54 84 453
7 2018-10-01 A 19 54 472
8 2018-11-01 A 44 19 516
9 2018-12-01 A 80 44 596
10 2018-01-01 B 24 0 24
11 2018-02-01 B 100 24 124
12 2018-09-01 B 40 0 164
13 2018-10-01 B 47 40 211
14 2018-12-01 B 10 0 221
15 2019-02-01 B 5 0 102
Upvotes: 1