FedePy
FedePy

Reputation: 51

Is there an easy way to remove end of the string in rows of a dataframe?

I'm new into Python/pandas and I'm losing my hair with Regex. I would like to use str.replace() to modify strings into a dataframe.

I have a 'Names' column into dataframe df which looks like this:

Jeffrey[1] 
Mike[3]
Philip(1)
Jeffrey[2]
etc...

I would like to remove in each single row of the column the end of the string which follows either the '[' or the '('...

I thought to use something like this below but I have hard time to understand regex, any tip with regard to a nice regex summary for beginner is welcome.

df['Names']=df['Names'].str.replace(r'REGEX??', '')

Thanks!

Upvotes: 2

Views: 40

Answers (3)

ALollz
ALollz

Reputation: 59549

You could use split to take everything before the first [ or ( characters.

df['Names'].str.split('\[|\(').str[0]

     Names
0  Jeffrey
1     Mike
2   Philip
3  Jeffrey

Upvotes: 2

Erfan
Erfan

Reputation: 42906

Extract only the alphabetic letters with Series.str.extract:

df['Names'] = df['Names'].str.extract('([A-Za-z]+)')

     Names
0  Jeffrey
1     Mike
2   Philip
3  Jeffrey

Upvotes: 3

Quang Hoang
Quang Hoang

Reputation: 150735

This regex would work, with $ indicates the end of the string:

 df['Names'] = df['Names'].str.extract('(.*)[\[|\(]\d+[\]\)]$')

Upvotes: 2

Related Questions