Reputation:
I have a dataframe with different columns (like price, id, product and date) and I need to divide this dataframe into several dataframes based on the current date of the system (current_date = np.datetime64(date.today())).
For example, if today is 2020-02-07 I want to divide my main dataframe into three different ones where df1 would be the data of the last month (data of 2020-01-07 to 2020-02-07), df2 would be the data of the last three months (excluding the month already in df1 so it would be more accurate to say from 2019-10-07 to 2020-01-07) and df3 would be the data left on the original dataframe.
Is there some easy way to do this? Also, I've been trying to use Grouper but I keep getting this error over an over again: NameError: name 'Grouper' is not defined (my Pandas version is 0.24.2)
Upvotes: 0
Views: 197
Reputation: 862641
You can use offsets.DateOffset
for last 1mont and 3month datetimes, filter by boolean indexing
:
rng = pd.date_range('2019-10-10', periods=20, freq='5d')
df = pd.DataFrame({'date': rng, 'id': range(20)})
print (df)
date id
0 2019-10-10 0
1 2019-10-15 1
2 2019-10-20 2
3 2019-10-25 3
4 2019-10-30 4
5 2019-11-04 5
6 2019-11-09 6
7 2019-11-14 7
8 2019-11-19 8
9 2019-11-24 9
10 2019-11-29 10
11 2019-12-04 11
12 2019-12-09 12
13 2019-12-14 13
14 2019-12-19 14
15 2019-12-24 15
16 2019-12-29 16
17 2020-01-03 17
18 2020-01-08 18
19 2020-01-13 19
current_date = pd.to_datetime('now').floor('d')
print (current_date)
2020-02-07 00:00:00
last1m = current_date - pd.DateOffset(months=1)
last3m = current_date - pd.DateOffset(months=3)
m1 = (df['date'] > last1m) & (df['date'] <= current_date)
m2 = (df['date'] > last3m) & (df['date'] <= last1m)
#filter non match m1 or m2 masks
m3 = ~(m1 | m2)
df1 = df[m1]
df2 = df[m2]
df3 = df[m3]
print (df1)
date id
18 2020-01-08 18
19 2020-01-13 19
print (df2)
date id
6 2019-11-09 6
7 2019-11-14 7
8 2019-11-19 8
9 2019-11-24 9
10 2019-11-29 10
11 2019-12-04 11
12 2019-12-09 12
13 2019-12-14 13
14 2019-12-19 14
15 2019-12-24 15
16 2019-12-29 16
17 2020-01-03 17
print (df3)
date id
0 2019-10-10 0
1 2019-10-15 1
2 2019-10-20 2
3 2019-10-25 3
4 2019-10-30 4
5 2019-11-04 5
Upvotes: 1