Reputation: 55
I have a column of strings in dataframe e.g. Warsaw (Warsaw University of Technology)
and want to strip the part that starts in (
and goes to the end of the string. Parts contained in brackets are diffrent in every row. How can I do that?
Upvotes: 0
Views: 37
Reputation: 33938
No need for regex. To throw away everything from the first occurrence of parenthesis on:
df['col1'].str.partition('(') [0]
or alternatively you could write a lambda to do:
df['col1'].apply(lambda s: s.split('(', 1) [0])
Upvotes: 1
Reputation: 2555
Assuming column named col1
, the regular expression in str.extract
matches from start of string up to the occurence of (
:
import pandas as pd
import re
d = {'col1': ['Warsaw (Warsaw University of Technology)','xxxx (yyyy)']}
df = pd.DataFrame(data=d)
df['col1'] = df['col1'].str.extract('(^[^\\(]*)')
print(df)
This prints:
col1
0 Warsaw
1 xxxx
Upvotes: 0