Reputation: 470
So I have this kind of data
data = [['A', 0], ['A', 1], ['A', 2], ['A', 15], ['A', 2], ['A', 12],['B',1],['B',3]]
df = pd.DataFrame(data, columns = ['name', 'interval'])
name interval
0 A 0
1 A 1
2 A 2
3 A 15
4 A 2
5 A 12
6 B 1
7 B 3
so I want to create a new name based on the interval (if the interval>10 then the new name is generated) but still using the previous name like this (this is just an example name)
name interval new_name
0 A 0 A_0
1 A 1 A_0
2 A 2 A_0
3 A 15 A_1
4 A 2 A_1
5 A 12 A_2
6 B 1 B_0
7 B 3 B_0
My current code is accessing every row using for, any other idea to process it? Thank you
######################
Credits to Rutger for his idea. This is the flow how to do it
name interval condition cumsum new_name(name+"_"+cumsum)
0 A 0 False 0 A_0
1 A 1 False 0 A_0
2 A 2 False 0 A_0
3 A 15 True 1 A_1
4 A 2 False 1 A_1
5 A 12 True 2 A_2
6 B 1 False 0 B_0
7 B 3 False 0 B_0
Details of the code is in the Rutger's answer
Upvotes: 1
Views: 52
Reputation: 603
I think the easiest is to start with creating a bool series and then create your new field like this:
df['large_interval'] = 10 < df['interval']
df['new_name'] = df['name'] + '_' + df.groupby('name')['large'].cumsum().astype(str)
On the second line it counts how many large intervals have passed per group. That value is used as a string and added after then name and _.
Upvotes: 1