Pandas - Creating a new column, the value of which increases on each occurrence of value X in an old column

Question

I have a DataFrame of the following structure:

Now I want to create a new column B such that, starting from 0, its value increments each time with each occurrence of 1 in column A. So the data frame above should look like the following:

Note that there is no pattern in the occurrences of 1s in A.

The code I have right now is:

def _add_col_B(data):

  data['B'] = -1

  ones = list((data.index[data['A'] == 1]))
  ones.append(len(data))

  sent = 0
  for i in range(len(ones)-1):
      data.loc[ones[i] : ones[i+1],:]['B'] = sent
      sent = sent + 1

  return data

%timeit -r 3 _add_col_B(data)
10 loops, best of 3: 184 ms per loop

But in my opinion, it is extremely slow, especially considering the fact that I need to do it for repeatedly and for very large data frames. Is there a vectorized way of doing this?

Alex Riley · Accepted Answer

Taking a vectorised approach, you could write:

df['B'] = (df['A'] == 1).cumsum() - 1

Which yields the DataFrame:

Pandas - Creating a new column, the value of which increases on each occurrence of value X in an old column

Answers (1)

Related Questions