Reputation: 508
I have a DataFrame of the following structure:
A
0 1
1 2
2 3
3 1
4 2
5 1
6 2
7 3
Now I want to create a new column B such that, starting from 0, its value increments each time with each occurrence of 1 in column A. So the data frame above should look like the following:
A B
0 1 0
1 2 0
2 3 0
3 1 1
4 2 1
5 1 2
6 2 2
7 3 2
Note that there is no pattern in the occurrences of 1s in A.
The code I have right now is:
def _add_col_B(data):
data['B'] = -1
ones = list((data.index[data['A'] == 1]))
ones.append(len(data))
sent = 0
for i in range(len(ones)-1):
data.loc[ones[i] : ones[i+1],:]['B'] = sent
sent = sent + 1
return data
%timeit -r 3 _add_col_B(data)
10 loops, best of 3: 184 ms per loop
But in my opinion, it is extremely slow, especially considering the fact that I need to do it for repeatedly and for very large data frames. Is there a vectorized way of doing this?
Upvotes: 3
Views: 416
Reputation: 176850
Taking a vectorised approach, you could write:
df['B'] = (df['A'] == 1).cumsum() - 1
Which yields the DataFrame:
A B
0 1 0
1 2 0
2 3 0
3 1 1
4 2 1
5 1 2
6 2 2
7 3 2
Upvotes: 5