Reputation: 793
I have a data frame
yr = pd.DataFrame({"age":["(young 17 yrs)","(young 19 yrs)","(old)","(young 25 yrs)",
"(old)","(young 27 yrs)"]})
I want to add another column named as "i_tag" which will extract the string from the "age" column weather it is young or old and cumulatively assign id_number accordingly
Required Output
yr = pd.DataFrame({"age":["(young 17 yrs)","(young 19 yrs)","(old)","(young 25 yrs)",
"(old)","(old)"], "i_tag":["id1","id1","id2","id3", "id4","id4"]})
Upvotes: 0
Views: 25
Reputation: 61910
You could do:
ages = (yr['age'].str.extract(r'\b(old|young)\b'))
tag = 'id{}'.format
yr['i_tag'] = (ages != ages.shift(1)).cumsum().squeeze().apply(tag)
print(yr)
Output
age i_tag
0 (young 17 yrs) id1
1 (young 19 yrs) id1
2 (old) id2
3 (young 25 yrs) id3
4 (old) id4
5 (young 27 yrs) id5
Upvotes: 1
Reputation: 323306
Let us do findall
then with shift
and cumsum
s = yr.age.str.findall('young|old').str[0]
yr['tag']=s.ne(s.shift()).cumsum()
yr
Out[342]:
age tag
0 (young 17 yrs) 1
1 (young 19 yrs) 1
2 (old) 2
3 (young 25 yrs) 3
4 (old) 4
5 (young 27 yrs) 5
Upvotes: 1