Reputation: 57
I have the following extract from a larger Data Frame:
df = pd.DataFrame({'Tipe': ['DM','DM','DM','DS','DS','DI','DI','DM','DI','DS','DM','DM','DM','DM','DI','DM','DM','DS','DS','DS','DM']})
The objective is to create a column "D" where, for an index greater than "X", the program must search the column "Type" for the last n values and count only those that are identified as 'DM'.
for example, if "x" and "n" were 5, I expect something like this:
Tipe D
0 DM NaN
1 DM NaN
2 DM NaN
3 DS NaN
4 DS NaN
5 DI NaN
6 DI 2.0
7 DM 1.0
8 DI 1.0
9 DS 1.0
10 DM 1.0
11 DM 2.0
12 DM 3.0
13 DM 3.0
14 DI 4.0
15 DM 4.0
16 DM 4.0
17 DS 4.0
18 DS 3.0
19 DS 2.0
20 DM 2.0
I try with ".tail" but the existing 'DM' values throughout the entire column, not just the ones in the last n values.
Upvotes: 1
Views: 87
Reputation: 29742
Use pandas.Series.shift
and rolling
with where
:
x = 5
n = 5
s = df["Type"].eq("DM").shift().rolling(n).sum()
df["D"] = s.where(s.index > x)
Output:
Type D
0 DM NaN
1 DM NaN
2 DM NaN
3 DS NaN
4 DS NaN
5 DI NaN
6 DI 2.0
7 DM 1.0
8 DI 1.0
9 DS 1.0
10 DM 1.0
11 DM 2.0
12 DM 3.0
13 DM 3.0
14 DI 4.0
15 DM 4.0
16 DM 4.0
17 DS 4.0
18 DS 3.0
19 DS 2.0
20 DM 2.0
Upvotes: 2