Reputation: 71
I got a DataFrame that loks like this:
I'd like to add a new column that checks for each row if the "attributes.count" in the current and also the 5 previous rows stays at zero. If so, I want it to return True. In Excel I would simply use realtive references for the last 5 cells, but I didn't find anything like that for pandas.
So if I am in row 55, I just want to check if the rows 50 to 55 contain nothing but zeros and, if so, return True.
I experimented with the .diff() method but that doesn't really do the trick, since it only checks the previous row and not a certain amount of prevous rows all together:
df["Is zero?"] = df["attributes.count"].diff()
Is there any workaround for this? Or maybe even a specific method I don't know of yet? (I'm an absolute beginner both in regards to coding and Python, so please forgive my stupidity :D )
Upvotes: 0
Views: 93
Reputation: 3184
Just use rolling with sum.
df = pd.DataFrame(
{"attributes_count": [0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 3, 1]}
)
print(df)
attributes_count
0 0
1 0
2 0
3 0
4 0
5 1
6 2
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 3
15 1
Then create a new column using rolling over 5 period window and summing. If it sums to zero then that is True.
df["Is zero?"] = df["attributes_count"].rolling(5).sum()
print(df)
attributes_count Is zero?
0 0 nan
1 0 nan
2 0 nan
3 0 nan
4 0 0.0000
5 1 1.0000
6 2 3.0000
7 0 3.0000
8 0 3.0000
9 0 3.0000
10 0 2.0000
11 0 0.0000
12 0 0.0000
13 0 0.0000
14 3 3.0000
15 1 4.0000
Unforunately the true and false are flipped backwords. So we need to make 0 be 1.
df["Is zero?"] = np.where(df["Is zero?"], 0, 1)
print(df)
attributes_count Is zero?
0 0 0
1 0 0
2 0 0
3 0 0
4 0 1
5 1 0
6 2 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 1
12 0 1
13 0 1
14 3 0
15 1 0
Upvotes: 1
Reputation: 233
Suppose ['col1','col2','col3']
are the columns you want to check they are zero
s = 0
for col in ['col1','col2','col3']:
s += 1*(df[col] == 0)
s is one if there was a zero in one of the columns (for each row). Then simply define df['Is zero'] = s
Upvotes: 0