ℕʘʘḆḽḘ
ℕʘʘḆḽḘ

Reputation: 19375

how to extract numeric information from a string in Pandas?

I have a column in my dataframe that contains string rows such as :

'(0.0,0.8638888888888889,3.7091666666666665,12.023333333333333,306.84694444444443)'

This output (produced by another program) corresponds to the min, 25th, median, 75th and max for a given variable.

I would like to extract that information, and put them in separate numeric columns, such as

min   p25    p50
0.0   0.864  3.70

The data I have is really large. How can I do that in Pandas?

Many thanks!

Upvotes: 0

Views: 143

Answers (1)

EdChum
EdChum

Reputation: 393943

IIUC then the following should work:

In [280]:
df = pd.DataFrame({'col':['(0.0,0.8638888888888889,3.7091666666666665,12.023333333333333,306.84694444444443)']})
df

Out[280]:
                                                 col
0  (0.0,0.8638888888888889,3.7091666666666665,12....

In [297]:
df[['min','p25','p50']] = df['col'].str.replace('\'|\(|\)','').str.split(',', expand=True).astype(np.float64)[[0,1,2]]
df

Out[297]:
                                                 col  min       p25       p50
0  (0.0,0.8638888888888889,3.7091666666666665,12....  0.0  0.863889  3.709167

So this replaces the ' ( and ) characters with blank using str.replace and then we split using str.split on the comma and cast the type to float and then index the cols of interest.

Upvotes: 1

Related Questions