Arseniy Krupenin
Arseniy Krupenin

Reputation: 3880

Pandas: Add new column to df with condition

I have df and I need to create new column in it.

i,ID,url,used_at,active_seconds,domain,search_term, diff_time
322015,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/antoninaribina,2015-10-31 09:16:05,35,vk.com,None, 108    
838267,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed,2015-10-31 09:16:38,54,vk.com,None, 79 
838271,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos,2015-11-30 09:17:32,34,vk.com,None, 513   
322026,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos&z=photo143297356_397216312%2Ffeed1_143297356_1451504298,2015-11- 30 09:18:06,4,vk.com,None, 242    
838275,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos,2015-12-31 09:18:10,4,vk.com,None, 131    
322028,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=comments,2015-12-31 09:18:14,8,vk.com,None, 317  
322029,f85ce4b2f8787d48edc8612b2ccaca83,megarand.ru/contest/121070,2015-12-31 09:18:22,16,megarand.ru,None, 17  
1870917,f85ce4b2f8787d48edc8612b2ccaca83,eldorado.ru/cat/1461428,2015-12-31 09:18:38,6,vk.com,None, 129  
1354612,f85ce4b2f8787d48edc8612b2ccaca83,vk.com/antoninaribina,2015-12-31 19:18:44,56,vk.com,None, 417   

I want to add column period and if diff_time < 500, period = i, if diff_time > 500, period = i + 1 and if id in next string != id from prev string, period = i + 1 Desire output

i,ID,url,used_at,active_seconds,domain,search_term, diff_time, period
322015,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/antoninaribina,2015-10-31 09:16:05,35,vk.com,None, 108, 1    
838267,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed,2015-10-31 09:16:38,54,vk.com,None, 79, 1 
838271,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos,2015-11-30 09:17:32,34,vk.com,None, 513, 2   
322026,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos&z=photo143297356_397216312%2Ffeed1_143297356_1451504298,2015-11- 30 09:18:06,4,vk.com,None, 242, 2   
838275,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos,2015-12-31 09:18:10,4,vk.com,None, 131, 2    
322028,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=comments,2015-12-31 09:18:14,8,vk.com,None, 317, 2  
322029,f85ce4b2f8787d48edc8612b2ccaca83,megarand.ru/contest/121070,2015-12-31 09:18:22,16,megarand.ru,None, 17, 3  
1870917,f85ce4b2f8787d48edc8612b2ccaca83,eldorado.ru/cat/1461428,2015-12-31 09:18:38,6,vk.com,None, 129, 3  
1354612,f85ce4b2f8787d48edc8612b2ccaca83,vk.com/antoninaribina,2015-12-31 19:18:44,56,vk.com,None, 517, 4 

Upvotes: 0

Views: 431

Answers (1)

akuiper
akuiper

Reputation: 214927

Construct a switch variable that stores true if the period needs to be increased and false otherwise and then call cumsum() function on the obtained series:

switch = (df.diff_time > 500) | (df.ID != df.ID.shift().fillna(df.ID[0]))
switch.cumsum() + 1

# 0    1
# 1    1
# 2    2
# 3    2
# 4    2
# 5    2
# 6    3
# 7    3
# 8    4
# dtype: int64

Assigning this back to your data frame should give you what you need:

df['period'] = switch.cumsum() + 1

Upvotes: 2

Related Questions