Reputation: 115
I have a pandas data frame that looks like
Run Time ... K Recovery Ca Recovery
0 14 05:02:54 ... 61,498.671 (492.0%) 62,095.863 (496.8%)
1 19 08:14:59 ... 63,576.997 (508.6%) 63,986.691 (511.9%)
2 35 10:30:42 ... 63,609.755 (508.9%) 64,400.180 (515.2%)
I want it to only keep the percentages and remove everything that isn't numeric so that it looks like this:
Run Time ... K Recovery Ca Recovery
0 14 05:02:54 ... 492.0 496.8
1 19 08:14:59 ... 508.6 511.9
2 35 10:30:42 ... 508.9 515.2
I was able to isolate the percentages by adding this re.findall(r'\(.*?\)', CaRecovery)
function to each individual string when I was creating the lists that make up my pandas data base, however this gave me some odd formatting problems:
Run Time Be Recovery ... Al Recovery K Recovery Ca Recovery
0 14 05:02:54 [(98.2%)] ... [(487.1%)] [(492.0%)] [(496.8%)]
1 19 08:14:59 [(101.6%)] ... [(499.8%)] [(508.6%)] [(511.9%)]
2 35 10:30:42 [(101.5%)] ... [(502.9%)] [(508.9%)] [(515.2%)]
It added the square brackets around the parentheses, and now for some reason
df = df.replace(r'[%]', '', regex=True)
has no effect on the database.
I need it just to be the numbers so I can convert the columns into floats.
Upvotes: 1
Views: 629
Reputation: 4127
Using groupings and pulling response out of the returned list:
import re
test_string = r"62,095.863 (496.8%)"
pattern = r"\(([^%]*)%?\)" # without %
print(re.findall(pattern, test_string)[0])
pattern2 = r"\((.*)\)" # with %
print(re.findall(pattern2, test_string)[0])
Upvotes: 1
Reputation: 195528
Try:
df.update(
df.filter(regex=r"Recovery$").apply(
lambda x: x.str.extract(r"\(([\d.-]+)%\)", expand=False)
)
)
print(df)
Prints:
Run Time K Recovery Ca Recovery
0 14 05:02:54 492.0 496.8
1 19 08:14:59 508.6 511.9
2 35 10:30:42 508.9 515.2
Upvotes: 4