how to extract numeric information from a string in Pandas?

Question

I have a column in my dataframe that contains string rows such as :

'(0.0,0.8638888888888889,3.7091666666666665,12.023333333333333,306.84694444444443)'

This output (produced by another program) corresponds to the min, 25th, median, 75th and max for a given variable.

I would like to extract that information, and put them in separate numeric columns, such as

min   p25    p50
0.0   0.864  3.70

The data I have is really large. How can I do that in Pandas?

Many thanks!

EdChum · Accepted Answer

IIUC then the following should work:

In [280]:
df = pd.DataFrame({'col':['(0.0,0.8638888888888889,3.7091666666666665,12.023333333333333,306.84694444444443)']})
df

Out[280]:
                                                 col
0  (0.0,0.8638888888888889,3.7091666666666665,12....

In [297]:
df[['min','p25','p50']] = df['col'].str.replace('\'|$|$','').str.split(',', expand=True).astype(np.float64)[[0,1,2]]
df

Out[297]:
                                                 col  min       p25       p50
0  (0.0,0.8638888888888889,3.7091666666666665,12....  0.0  0.863889  3.709167

So this replaces the ' ( and ) characters with blank using str.replace and then we split using str.split on the comma and cast the type to float and then index the cols of interest.

how to extract numeric information from a string in Pandas?

Answers (1)

Related Questions