Reputation: 19375
I have a column in my dataframe that contains string rows such as :
'(0.0,0.8638888888888889,3.7091666666666665,12.023333333333333,306.84694444444443)'
This output (produced by another program) corresponds to the min, 25th, median, 75th and max for a given variable.
I would like to extract that information, and put them in separate numeric columns, such as
min p25 p50
0.0 0.864 3.70
The data I have is really large. How can I do that in Pandas?
Many thanks!
Upvotes: 0
Views: 143
Reputation: 393943
IIUC then the following should work:
In [280]:
df = pd.DataFrame({'col':['(0.0,0.8638888888888889,3.7091666666666665,12.023333333333333,306.84694444444443)']})
df
Out[280]:
col
0 (0.0,0.8638888888888889,3.7091666666666665,12....
In [297]:
df[['min','p25','p50']] = df['col'].str.replace('\'|\(|\)','').str.split(',', expand=True).astype(np.float64)[[0,1,2]]
df
Out[297]:
col min p25 p50
0 (0.0,0.8638888888888889,3.7091666666666665,12.... 0.0 0.863889 3.709167
So this replaces the '
(
and )
characters with blank using str.replace
and then we split using str.split
on the comma and cast the type to float and then index the cols of interest.
Upvotes: 1