Reputation: 25
I'm trying to remove the text after "(" delimiter, firts I would like count the rows that have a least one "(" and after that, remove the text after this delimiter, including the delimiter.
The column that contains the text is 'Country' and look like this:
Micronesia (Federated States of)
I hope a result like this:
Micronesia
this is what i tried to count rows
energy['Country'].value_counts()[['(']].sum
It returned this error:
"None of [Index(['('], dtype='object')] are in the [index]"
for removing thext after delimiter I tried this:
energy['Country'] = energy['Country'].split("(", 1)
It returned this error:
AttributeError: 'Series' object has no attribute 'split'
How could I solve this?
Upvotes: 1
Views: 271
Reputation: 26
that's because you trying to split a series not the values of the rows . instead of that use this:
energy['Country'] = energy['Country'].apply(lambda x:x.split('(',1))
this should grab what ever after "(" delimiter and if you want to remove after it you can use this:
energy['Country'] = energy['Country'].apply(lambda x:x.replace(x.split('(',1),''))
Upvotes: 0
Reputation: 636
You can iterate on each row and remove text after '(' using str
slicing :
p_count=0
for index,row in energy.iterrows():
if '(' in row['Country']:
p_count+=1
row['Country']=row['Country'][:row['Country'].find('(')].strip()
Upvotes: 0
Reputation: 9941
You can apply str.split
to the column, then take the first element with .str[0]
and remove leading/trailing spaces with str.strip
:
df = pd.DataFrame({'country': ['Micronesia (Federated States of)']})
df['country'] = df['country'].str.split('(').str[0].str.strip()
df
Output:
country
0 Micronesia
And another (less verbose) option with str.extract
:
df['country'] = df['country'].str.extract('(.*)\s*\(')
Upvotes: 3
Reputation: 5889
Try this. It will do this on every row instead of every column
for index,row in reviews.iterrows():
print(energy['Country'].split("(")[0])
If you want to do this on specific row you can do
print(energy['Country'][0].split("(")[0])
Upvotes: 0
Reputation: 82785
Using .str.replace
with regex.
Ex:
energy['Country'] = energy['Country'].str.replace(r"(\(.*\))", "")
Upvotes: 2