Reputation: 2233
I have implemented solution with iterating rows but it takes too long because of size of dataframe. The problem is this:
I have a dataframe like this (ignore the first 3 columns):
Column D has only 1 (True) or 0 (False). 0 (or False) marks the end of a series of 1's (or True's). So the 0 in the 4th row means that there were four 1's in col D. Again, going down in col D, there were two 1's. Then only one 1's and so on.
A B C D
2 3 6 1
4 8 2 1
2 3 6 1
4 8 2 0
2 3 6 1
4 8 2 1
2 3 6 0
4 8 2 0
2 3 6 0
4 8 2 1
2 3 6 1
4 8 2 1
3 4 1 0
...
8 2 3 1
6 2 0 0
I would like to insert a column 'Interval' which shows these intervals like this.
A B C D Interval
2 3 6 1
4 8 2 1
2 3 6 1
4 8 2 0 4
2 3 6 1
4 8 2 1
2 3 6 0 3
4 8 2 0 1
2 3 6 0 1
4 8 2 1
2 3 6 1
4 8 2 1
3 4 1 0 4
...
8 2 3 1
6 2 0 0 2
I dont actually care which row the interval number is written or it can output the column somewhere else where I can do histograms, average intervals, etc.
Any way I can do this without iterating over the rows individually?
Upvotes: 1
Views: 860
Reputation: 5354
We can do that by writing a function that iterate the list (D). We go through the list, initialize a counter by 1, whenever we find one we increment, whenever we find 0, we affect the value and re-do the same process.
import pandas as pd
import copy
df = pd.DataFrame([1,1,1,0,1,1,0,0,0,1,1,1,0])
df.columns = ['D']
d= copy.copy(df.D)
def transform(l):
count=1
for index,x in enumerate(l):
if x==0:
l[index]=count
count=1
else:
l[index]=0
count+=1
return l
df['intervales']=transform(t)
df['D']=d
print df
The Output:
D intervales
0 1 0
1 1 0
2 1 0
3 0 4
4 1 0
5 1 0
6 0 3
7 0 1
8 0 1
9 1 0
10 1 0
11 1 0
12 0 4
I tried to do that using itertools, but it leads to treat many cases.
# import itertools
# l= [list(g) for k,g in itertools.groupby(df.D,lambda x:x in [0]) ]
Upvotes: 1