How to compare various specific cells in a DataFrame (in a relative manner)?

Question

I got a DataFrame that loks like this:

I'd like to add a new column that checks for each row if the "attributes.count" in the current and also the 5 previous rows stays at zero. If so, I want it to return True. In Excel I would simply use realtive references for the last 5 cells, but I didn't find anything like that for pandas.

So if I am in row 55, I just want to check if the rows 50 to 55 contain nothing but zeros and, if so, return True.

I experimented with the .diff() method but that doesn't really do the trick, since it only checks the previous row and not a certain amount of prevous rows all together:

df["Is zero?"] = df["attributes.count"].diff()

Is there any workaround for this? Or maybe even a specific method I don't know of yet? (I'm an absolute beginner both in regards to coding and Python, so please forgive my stupidity :D )

run-out · Accepted Answer

Just use rolling with sum.

df = pd.DataFrame(
    {"attributes_count": [0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 3, 1]}
)
print(df)
    attributes_count
0                  0
1                  0
2                  0
3                  0
4                  0
5                  1
6                  2
7                  0
8                  0
9                  0
10                 0
11                 0
12                 0
13                 0
14                 3
15                 1

Then create a new column using rolling over 5 period window and summing. If it sums to zero then that is True.

df["Is zero?"] = df["attributes_count"].rolling(5).sum()
print(df)
    attributes_count  Is zero?
0                  0       nan
1                  0       nan
2                  0       nan
3                  0       nan
4                  0    0.0000
5                  1    1.0000
6                  2    3.0000
7                  0    3.0000
8                  0    3.0000
9                  0    3.0000
10                 0    2.0000
11                 0    0.0000
12                 0    0.0000
13                 0    0.0000
14                 3    3.0000
15                 1    4.0000

Unforunately the true and false are flipped backwords. So we need to make 0 be 1.

df["Is zero?"] = np.where(df["Is zero?"], 0, 1)
print(df)
   attributes_count  Is zero?
0                  0         0
1                  0         0
2                  0         0
3                  0         0
4                  0         1
5                  1         0
6                  2         0
7                  0         0
8                  0         0
9                  0         0
10                 0         0
11                 0         1
12                 0         1
13                 0         1
14                 3         0
15                 1         0

How to compare various specific cells in a DataFrame (in a relative manner)?

Answers (2)

Related Questions