mane
mane

Reputation: 2233

Python Pandas count interval of occurrence

I have implemented solution with iterating rows but it takes too long because of size of dataframe. The problem is this:

I have a dataframe like this (ignore the first 3 columns):

Column D has only 1 (True) or 0 (False). 0 (or False) marks the end of a series of 1's (or True's). So the 0 in the 4th row means that there were four 1's in col D. Again, going down in col D, there were two 1's. Then only one 1's and so on.

A B C D
2 3 6 1
4 8 2 1
2 3 6 1
4 8 2 0
2 3 6 1
4 8 2 1
2 3 6 0
4 8 2 0
2 3 6 0
4 8 2 1
2 3 6 1
4 8 2 1
3 4 1 0
...
8 2 3 1
6 2 0 0

I would like to insert a column 'Interval' which shows these intervals like this.

A B C D Interval
2 3 6 1 
4 8 2 1
2 3 6 1
4 8 2 0 4
2 3 6 1
4 8 2 1
2 3 6 0 3
4 8 2 0 1
2 3 6 0 1
4 8 2 1 
2 3 6 1
4 8 2 1
3 4 1 0 4
...
8 2 3 1
6 2 0 0 2

I dont actually care which row the interval number is written or it can output the column somewhere else where I can do histograms, average intervals, etc.

Any way I can do this without iterating over the rows individually?

Upvotes: 1

Views: 860

Answers (1)

user3378649
user3378649

Reputation: 5354

We can do that by writing a function that iterate the list (D). We go through the list, initialize a counter by 1, whenever we find one we increment, whenever we find 0, we affect the value and re-do the same process.

import pandas as pd
import copy

df = pd.DataFrame([1,1,1,0,1,1,0,0,0,1,1,1,0])

df.columns = ['D']
d= copy.copy(df.D)

def transform(l):
  count=1
  for index,x in enumerate(l): 
    if x==0:
      l[index]=count
      count=1
    else:
      l[index]=0
      count+=1
  return l

df['intervales']=transform(t)
df['D']=d

print df

The Output:

     D  intervales
0   1           0
1   1           0
2   1           0
3   0           4
4   1           0
5   1           0
6   0           3
7   0           1
8   0           1
9   1           0
10  1           0
11  1           0
12  0           4

I tried to do that using itertools, but it leads to treat many cases.

# import itertools
# l= [list(g) for k,g in itertools.groupby(df.D,lambda x:x in [0]) ]

Upvotes: 1

Related Questions