Only select the contents of a string which is surrounded by parentheses (string inside pandas data frame)

Question

I have a pandas data frame that looks like

  Run      Time  ...           K Recovery          Ca Recovery
0  14  05:02:54  ...  61,498.671 (492.0%)  62,095.863 (496.8%)
1  19  08:14:59  ...  63,576.997 (508.6%)  63,986.691 (511.9%)
2  35  10:30:42  ...  63,609.755 (508.9%)  64,400.180 (515.2%)

I want it to only keep the percentages and remove everything that isn't numeric so that it looks like this:

  Run      Time  ...  K Recovery   Ca Recovery
0  14  05:02:54  ...  492.0        496.8
1  19  08:14:59  ...  508.6        511.9
2  35  10:30:42  ...  508.9        515.2

I was able to isolate the percentages by adding this re.findall(r'$.*?$', CaRecovery) function to each individual string when I was creating the lists that make up my pandas data base, however this gave me some odd formatting problems:

  Run      Time Be Recovery  ... Al Recovery  K Recovery Ca Recovery
0  14  05:02:54   [(98.2%)]  ...  [(487.1%)]  [(492.0%)]  [(496.8%)]
1  19  08:14:59  [(101.6%)]  ...  [(499.8%)]  [(508.6%)]  [(511.9%)]
2  35  10:30:42  [(101.5%)]  ...  [(502.9%)]  [(508.9%)]  [(515.2%)]

It added the square brackets around the parentheses, and now for some reason

df = df.replace(r'[%]', '', regex=True)

has no effect on the database.

I need it just to be the numbers so I can convert the columns into floats.

Andrej Kesely · Accepted Answer

Try:

df.update(
    df.filter(regex=r"Recovery$").apply(
        lambda x: x.str.extract(r"$([\d.-]+)%$", expand=False)
    )
)
print(df)

Prints:

   Run      Time K Recovery Ca Recovery
0   14  05:02:54      492.0       496.8
1   19  08:14:59      508.6       511.9
2   35  10:30:42      508.9       515.2

Only select the contents of a string which is surrounded by parentheses (string inside pandas data frame)

Answers (2)

Related Questions