Rafa
Rafa

Reputation: 71

How to compare various specific cells in a DataFrame (in a relative manner)?

I got a DataFrame that loks like this:

enter image description here

I'd like to add a new column that checks for each row if the "attributes.count" in the current and also the 5 previous rows stays at zero. If so, I want it to return True. In Excel I would simply use realtive references for the last 5 cells, but I didn't find anything like that for pandas.

So if I am in row 55, I just want to check if the rows 50 to 55 contain nothing but zeros and, if so, return True.

I experimented with the .diff() method but that doesn't really do the trick, since it only checks the previous row and not a certain amount of prevous rows all together:

df["Is zero?"] = df["attributes.count"].diff()

Is there any workaround for this? Or maybe even a specific method I don't know of yet? (I'm an absolute beginner both in regards to coding and Python, so please forgive my stupidity :D )

Upvotes: 0

Views: 93

Answers (2)

run-out
run-out

Reputation: 3184

Just use rolling with sum.

df = pd.DataFrame(
    {"attributes_count": [0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 3, 1]}
)
print(df)
    attributes_count
0                  0
1                  0
2                  0
3                  0
4                  0
5                  1
6                  2
7                  0
8                  0
9                  0
10                 0
11                 0
12                 0
13                 0
14                 3
15                 1

Then create a new column using rolling over 5 period window and summing. If it sums to zero then that is True.

df["Is zero?"] = df["attributes_count"].rolling(5).sum()
print(df)
    attributes_count  Is zero?
0                  0       nan
1                  0       nan
2                  0       nan
3                  0       nan
4                  0    0.0000
5                  1    1.0000
6                  2    3.0000
7                  0    3.0000
8                  0    3.0000
9                  0    3.0000
10                 0    2.0000
11                 0    0.0000
12                 0    0.0000
13                 0    0.0000
14                 3    3.0000
15                 1    4.0000

Unforunately the true and false are flipped backwords. So we need to make 0 be 1.

df["Is zero?"] = np.where(df["Is zero?"], 0, 1)
print(df)
   attributes_count  Is zero?
0                  0         0
1                  0         0
2                  0         0
3                  0         0
4                  0         1
5                  1         0
6                  2         0
7                  0         0
8                  0         0
9                  0         0
10                 0         0
11                 0         1
12                 0         1
13                 0         1
14                 3         0
15                 1         0

Upvotes: 1

umbreon29
umbreon29

Reputation: 233

Suppose ['col1','col2','col3'] are the columns you want to check they are zero

s = 0
for col in ['col1','col2','col3']: 
     s += 1*(df[col] == 0)

s is one if there was a zero in one of the columns (for each row). Then simply define df['Is zero'] = s

Upvotes: 0

Related Questions