Reputation: 9337
Starting out with data like this:
np.random.seed(314)
df = pd.DataFrame({
'date':[pd.date_range('2016-04-01', '2016-04-05')[r] for r in np.random.randint(0,5,20)],
'cat':['ABCD'[r] for r in np.random.randint(0,4,20)],
'count': np.random.randint(0,100,20)
})
cat count date
0 B 84 2016-04-04
1 A 95 2016-04-05
2 D 89 2016-04-02
3 D 39 2016-04-05
4 A 39 2016-04-01
5 C 61 2016-04-05
6 C 58 2016-04-04
7 B 49 2016-04-03
8 D 20 2016-04-02
9 B 54 2016-04-01
10 B 87 2016-04-01
11 D 36 2016-04-05
12 C 13 2016-04-05
13 A 79 2016-04-04
14 B 91 2016-04-03
15 C 83 2016-04-05
16 C 85 2016-04-05
17 D 93 2016-04-01
18 C 32 2016-04-02
19 B 29 2016-04-03
Next, I calculate totals by date
, pivot cat
into columns, and calculate running totals for each column:
summary = df.groupby(['date','cat']).sum().unstack().fillna(0).cumsum()
cat A B C D
date
2016-04-01 80 235 99 0
2016-04-02 85 295 153 14
2016-04-03 111 363 224 14
2016-04-04 111 379 296 50
2016-04-05 111 511 296 50
Now I want to remove columns where the last column is less than some value, say 150. The result should look like:
cat B C
date
2016-04-01 235 99
2016-04-02 295 153
2016-04-03 363 224
2016-04-04 379 296
2016-04-05 511 296
I've figured out one part of it:
mask = summary[-1:].squeeze() > 150
cat
count A False
B True
C True
D False
will give me a mask for dropping columns. What I can't figure out is how to use it with a call to summary.drop(...)
. Any hints?
Upvotes: 1
Views: 1163
Reputation: 139172
Instead of dropping the columns you do not want, you can also select the ones you want (using the mask with boolean indexing):
In [16]: mask = summary[-1:].squeeze() > 220
In [17]: summary.loc[:, mask]
Out[17]:
count
cat B D
date
2016-04-01 141.0 94.0
2016-04-02 235.0 94.0
2016-04-03 235.0 144.0
2016-04-04 326.0 144.0
2016-04-05 384.0 229.0
(I used 220 instead of 150, otherwise all columns were selected)
Further, a better way to calculate the mask is probably the following:
mask = summary.iloc[-1] > 220
which just selects the last row (by position) instead of using squeeze.
Upvotes: 5