Reputation: 51
I'm new into Python/pandas and I'm losing my hair with Regex. I would like to use str.replace() to modify strings into a dataframe.
I have a 'Names' column into dataframe df which looks like this:
Jeffrey[1]
Mike[3]
Philip(1)
Jeffrey[2]
etc...
I would like to remove in each single row of the column the end of the string which follows either the '[' or the '('...
I thought to use something like this below but I have hard time to understand regex, any tip with regard to a nice regex summary for beginner is welcome.
df['Names']=df['Names'].str.replace(r'REGEX??', '')
Thanks!
Upvotes: 2
Views: 40
Reputation: 59549
You could use split
to take everything before the first [
or (
characters.
df['Names'].str.split('\[|\(').str[0]
Names
0 Jeffrey
1 Mike
2 Philip
3 Jeffrey
Upvotes: 2
Reputation: 42906
Extract only the alphabetic letters with Series.str.extract
:
df['Names'] = df['Names'].str.extract('([A-Za-z]+)')
Names
0 Jeffrey
1 Mike
2 Philip
3 Jeffrey
Upvotes: 3
Reputation: 150735
This regex would work, with $
indicates the end of the string:
df['Names'] = df['Names'].str.extract('(.*)[\[|\(]\d+[\]\)]$')
Upvotes: 2