Amit
Amit

Reputation: 793

how to combine str with cumulative no and make another column in a dataframe in python?

I have a data frame

yr = pd.DataFrame({"age":["(young 17 yrs)","(young 19 yrs)","(old)","(young 25 yrs)",  
"(old)","(young 27 yrs)"]})

I want to add another column named as "i_tag" which will extract the string from the "age" column weather it is young or old and cumulatively assign id_number accordingly

Required Output

yr = pd.DataFrame({"age":["(young 17 yrs)","(young 19 yrs)","(old)","(young 25 yrs)",  
"(old)","(old)"], "i_tag":["id1","id1","id2","id3", "id4","id4"]})

Upvotes: 0

Views: 25

Answers (2)

Dani Mesejo
Dani Mesejo

Reputation: 61910

You could do:

ages = (yr['age'].str.extract(r'\b(old|young)\b'))
tag = 'id{}'.format
yr['i_tag'] = (ages != ages.shift(1)).cumsum().squeeze().apply(tag)
print(yr)

Output

              age i_tag
0  (young 17 yrs)   id1
1  (young 19 yrs)   id1
2           (old)   id2
3  (young 25 yrs)   id3
4           (old)   id4
5  (young 27 yrs)   id5

Upvotes: 1

BENY
BENY

Reputation: 323306

Let us do findall then with shift and cumsum

s = yr.age.str.findall('young|old').str[0]
yr['tag']=s.ne(s.shift()).cumsum()
yr
Out[342]: 
              age  tag
0  (young 17 yrs)    1
1  (young 19 yrs)    1
2           (old)    2
3  (young 25 yrs)    3
4           (old)    4
5  (young 27 yrs)    5

Upvotes: 1

Related Questions