Replace the value inside a csv column by value inside parentheses of the same column using python pandas

Question

I got the following csv file with sample data:

Now I want to replace the columns 'SIFT' and 'PolyPhen' values with the data inside the parentheses of these columns. So for row 1 the SIFT value will replace to 0.82, and for row 2 the SIFT value will be 0.85. Also I want the part before the parentheses, tolerated/deleterious, inside a new column named 'SIFT_prediction'.

This is what I tried so far:

import pandas as pd
import re

testfile = 'test_sift_columns.csv'
df = pd.read_csv(testfile)  
df['SIFT'].re.search(r'$(.*?)$',s).group(1)

This code will take everything inside the parentheses of the column SIFT. But this does not replace anything. I probably need a for loop to read and replace every row but I don't know how to do it correctly. Also I am not sure if using a regular expression is necessary with pandas. Maybe there is a smarter way to resolve my problem.

jezrael · Accepted Answer

Use Series.str.extract:

df = pd.DataFrame({'SIFT':['tol(0.82)','tol(0.85)','tol(1.42)'],
                   'PolyPhen':['beg(0)','beg(0)','beg(0)']})

pat = r'(.*?)$(.*?)$'
df[['SIFT_prediction','SIFT']] = df['SIFT'].str.extract(pat)
df[['PolyPhen_prediction','PolyPhen']] = df['PolyPhen'].str.extract(pat)

print(df)
  SIFT_prediction  SIFT PolyPhen_prediction PolyPhen
0             tol  0.82                 beg        0
1             tol  0.85                 beg        0
2             tol  1.42                 beg        0

Alternative:

df[['SIFT_prediction','SIFT']] = df['SIFT'].str.rstrip(')').str.split('(', expand=True)
df[['PolyPhen_prediction','PolyPhen']] = df['PolyPhen'].str.rstrip(')').str.split('(', expand=True)

Replace the value inside a csv column by value inside parentheses of the same column using python pandas

Answers (2)

Related Questions