rshar
rshar

Reputation: 1477

Fetch text around the parentheses from a pandas dataframe column and copy the output to the same column

I want to fetch only the text around parenthesis and keep this text in the same column.

I have the following dataframe df:

id     feature
1      mutation(MI:0118)
2      mutation(MI:0119)
3      mutation(MI:01120)

The expected output is:

id     feature
1      MI:0118
2      MI:0119
3      MI:01120

I tried the following regex but it is not allowing me to copy it to the same column.

df['feature'] = df['feature'].str.extract(r"\((.*?)\)", expand=False)

I am getting following warning and the above code is converting all the values in the feature column to NaN

/home/lib/python2.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.

Thanks

Upvotes: 1

Views: 102

Answers (1)

U13-Forward
U13-Forward

Reputation: 71600

Try using the below code with a different pattern:

df['feature'] = df['feature'].str.extract('.*\((.*)\).*', expand=False)
print(df)

Output:

   id   feature
0   1   MI:0118
1   2   MI:0119
2   3  MI:01120

Regex101

Upvotes: 1

Related Questions