how to combine str with cumulative no and make another column in a dataframe in python?

Question

I have a data frame

yr = pd.DataFrame({"age":["(young 17 yrs)","(young 19 yrs)","(old)","(young 25 yrs)",  
"(old)","(young 27 yrs)"]})

I want to add another column named as "i_tag" which will extract the string from the "age" column weather it is young or old and cumulatively assign id_number accordingly

Required Output

yr = pd.DataFrame({"age":["(young 17 yrs)","(young 19 yrs)","(old)","(young 25 yrs)",  
"(old)","(old)"], "i_tag":["id1","id1","id2","id3", "id4","id4"]})

Dani Mesejo · Accepted Answer

You could do:

ages = (yr['age'].str.extract(r'\b(old|young)\b'))
tag = 'id{}'.format
yr['i_tag'] = (ages != ages.shift(1)).cumsum().squeeze().apply(tag)
print(yr)

Output

              age i_tag
0  (young 17 yrs)   id1
1  (young 19 yrs)   id1
2           (old)   id2
3  (young 25 yrs)   id3
4           (old)   id4
5  (young 27 yrs)   id5

how to combine str with cumulative no and make another column in a dataframe in python?

Answers (2)

Related Questions