Jhoan Zuluaga
Jhoan Zuluaga

Reputation: 25

Remove text after a delimiter (parenthesis) in python

I'm trying to remove the text after "(" delimiter, firts I would like count the rows that have a least one "(" and after that, remove the text after this delimiter, including the delimiter.

The column that contains the text is 'Country' and look like this:

Micronesia (Federated States of)

I hope a result like this:

Micronesia

this is what i tried to count rows

energy['Country'].value_counts()[['(']].sum

It returned this error:

"None of [Index(['('], dtype='object')] are in the [index]"

for removing thext after delimiter I tried this:

energy['Country'] = energy['Country'].split("(", 1)

It returned this error:

AttributeError: 'Series' object has no attribute 'split'

How could I solve this?

Upvotes: 1

Views: 271

Answers (5)

hamidipour_AM
hamidipour_AM

Reputation: 26

that's because you trying to split a series not the values of the rows . instead of that use this:

energy['Country'] = energy['Country'].apply(lambda x:x.split('(',1))

this should grab what ever after "(" delimiter and if you want to remove after it you can use this:

energy['Country'] = energy['Country'].apply(lambda x:x.replace(x.split('(',1),''))

Upvotes: 0

Mohammad Anvari
Mohammad Anvari

Reputation: 636

You can iterate on each row and remove text after '(' using str slicing :

p_count=0
for index,row in energy.iterrows():
    if '(' in row['Country']:
        p_count+=1    
        row['Country']=row['Country'][:row['Country'].find('(')].strip()

Upvotes: 0

perl
perl

Reputation: 9941

You can apply str.split to the column, then take the first element with .str[0] and remove leading/trailing spaces with str.strip:

df = pd.DataFrame({'country': ['Micronesia (Federated States of)']})

df['country'] = df['country'].str.split('(').str[0].str.strip()
df

Output:

      country
0  Micronesia

And another (less verbose) option with str.extract:

df['country'] = df['country'].str.extract('(.*)\s*\(')

Upvotes: 3

Buddy Bob
Buddy Bob

Reputation: 5889

Try this. It will do this on every row instead of every column

for index,row in reviews.iterrows():
    print(energy['Country'].split("(")[0])

If you want to do this on specific row you can do

print(energy['Country'][0].split("(")[0])

Upvotes: 0

Rakesh
Rakesh

Reputation: 82785

Using .str.replace with regex.

Ex:

energy['Country'] = energy['Country'].str.replace(r"(\(.*\))", "")

Upvotes: 2

Related Questions