Reputation: 1623
This is a general question on vectorization, but I will use an example to help ask the question. I have a dataframe df
with df[col_1]
bool (True/False).
In df[col_2]
, I would like to return another True/False based on if the prior five rows of column 1, df[col_1][i-6:i-1]
, contain a match for df[col_1][i]
.
This is the loop I am using now, but it is one of many so I think they must be slowing things down as the data gets larger:
for i in df.index:
if i < 6:
df[col_2][i] = 0.
else:
df[col_2][i] = df[col_1][i] not in tuple(df[col_1].ix[i-6:i-1,col_1)
...output looks like this:
. col_1 col_2
0 TRUE
1 TRUE
2 TRUE
3 TRUE
4 FALSE
5 FALSE FALSE
6 FALSE FALSE
7 FALSE FALSE
8 FALSE FALSE
9 TRUE TRUE
10 FALSE FALSE
11 FALSE FALSE
12 FALSE FALSE
13 FALSE FALSE
14 TRUE FALSE
15 TRUE FALSE
16 TRUE FALSE
17 TRUE FALSE
18 TRUE FALSE
19 TRUE FALSE
20 FALSE TRUE
How can I do this in pandas with vectorization - maybe using shift()
or an offset function?
Upvotes: 3
Views: 160
Reputation: 30444
Here's a simple vectorized solution that should be pretty fast, although there is probably a more elegant way to write it. You can just ignore the first 5 rows or overwrite them to NaN if you prefer.
df = pd.DataFrame({ 'col_1':[True,True,True,True,False,False,False,False,
False,True,False,False,False,False,True,True,
True,True,True,True,False] })
df['col_2'] = ((df!=df.shift(1)) & (df!=df.shift(2)) & (df!=df.shift(3)) &
(df!=df.shift(4)) & (df!=df.shift(5)))
If speed really matters, you could do something like the following. It's more than 3x faster than the above and probably about as efficient as you can do here. This is just using the fact that rolling_sum()
will interpret booleans as 0/1 and you just need to know if the sum is 0 or 5.
df['rollsum'] = pd.rolling_sum(df.col_1,6) - df.col_1
df['col_3'] = ( ((df.col_1==True ) & (df.rollsum==0))
| ((df.col_1==False) & (df.rollsum==5)) )
col_1 col_2 rollsum col_3
0 True True NaN False
1 True False NaN False
2 True False NaN False
3 True False NaN False
4 False True NaN False
5 False False 4 False
6 False False 3 False
7 False False 2 False
8 False False 1 False
9 True True 0 True
10 False False 1 False
11 False False 1 False
12 False False 1 False
13 False False 1 False
14 True False 1 False
15 True False 1 False
16 True False 2 False
17 True False 3 False
18 True False 4 False
19 True False 5 False
20 False True 5 True
Upvotes: 2