Reputation: 2158
I am trying to remove ending 'OF' from a column in the pandas dataframe. I tried 'rstrip', 'split', but it also removes 'O' and 'F', I just need to remove 'OF'. How to do that? Not sure why rstrip removes 'O' and 'F' when I have specifically passed 'OF'. Sorry if this question was asked before, I just couldn't find one yet. Thanks.
Sample Data:
l1 = [1,2,3,4]
l2 = ['UNIVERSITY OF CONN. OF','ONTARIO','UNIV. OF TORONTO','ALASKA DEPT.OF']
df = pd.DataFrame({'some_id':l1,'org':l2})
df
some_id org
1 UNIVERSITY OF CONN. OF
2 ONTARIO
3 UNIV. OF TORONTO
4 ALASKA DEPT.OF
Tried:
df.org.str.rstrip('OF')
# df.org.str.split('OF')[0] # Not what I am looking for
Results:
0 UNIVERSITY OF CONN. # works
1 ONTARI # 'O' was removed
2 UNIV. OF TORONT # 'O' was removed
3 ALASKA DEPT. # works
Final output needed:
0 UNIVERSITY OF CONN.
1 ONTARIO
2 UNIV. OF TORONTO
3 ALASKA DEPT.
Upvotes: 3
Views: 577
Reputation: 59549
str.extract
Capture everything up until, and not including, a single optional 'OF'
at the end of the word. I added a few more rows for test cases.
df['extract'] = df.org.str.extract('(.*?)(?=(?:OF$)|$)')
# some_id org extract
#0 1 UNIVERSITY OF CONN. OF UNIVERSITY OF CONN.
#1 2 ONTARIO ONTARIO
#2 3 UNIV. OF TORONTO UNIV. OF TORONTO
#3 4 ALASKA DEPT.OF ALASKA DEPT.
#4 5 fooOFfooOFOF fooOFfooOF
#5 6 fF fF
#6 7 Seven Seven
Upvotes: 0
Reputation: 150735
You can try this regex:
df.org = df.org.str.replace('(OF)$','')
where $
indicates the end of string. Or
df.org.str.rstrip('(OF)')
seems to work as expected.
Output:
0 UNIVERSITY OF CONN.
1 ONTARIO
2 UNIV. OF TORONTO
3 ALASKA DEPT.
Name: org, dtype: object
Upvotes: 4