Syed Fahad Sultan
Syed Fahad Sultan

Reputation: 508

Pandas - Creating a new column, the value of which increases on each occurrence of value X in an old column

I have a DataFrame of the following structure:

    A  

0   1   
1   2   
2   3   
3   1   
4   2   
5   1   
6   2
7   3

Now I want to create a new column B such that, starting from 0, its value increments each time with each occurrence of 1 in column A. So the data frame above should look like the following:

    A   B

0   1   0
1   2   0
2   3   0
3   1   1
4   2   1
5   1   2 
6   2   2
7   3   2

Note that there is no pattern in the occurrences of 1s in A.

The code I have right now is:

def _add_col_B(data):

  data['B'] = -1

  ones = list((data.index[data['A'] == 1]))
  ones.append(len(data))

  sent = 0
  for i in range(len(ones)-1):
      data.loc[ones[i] : ones[i+1],:]['B'] = sent
      sent = sent + 1

  return data

%timeit -r 3 _add_col_B(data)
10 loops, best of 3: 184 ms per loop

But in my opinion, it is extremely slow, especially considering the fact that I need to do it for repeatedly and for very large data frames. Is there a vectorized way of doing this?

Upvotes: 3

Views: 416

Answers (1)

Alex Riley
Alex Riley

Reputation: 176850

Taking a vectorised approach, you could write:

df['B'] = (df['A'] == 1).cumsum() - 1

Which yields the DataFrame:

   A  B
0  1  0
1  2  0
2  3  0
3  1  1
4  2  1
5  1  2
6  2  2
7  3  2

Upvotes: 5

Related Questions