how to apply .strip().split() function to an entire column in a Pandas dataframe

Question

Example of Dataframe My Pandas dataframe has a column EvaRange which is captured in the following way.

<1000 mm
1000-1200mm
1200-1400mm
>1400mm

Desired Output I want to perform some Machine Learning on the dataframe so I need to convert this into a single numerical value.

So far I have managed to do this for a single row in the dataframe but I want to apply it to the entire column.

Code Example

a = df["EvaRange"][0].strip().split('mm')[0].split('-')
b = (float(a[0])+float(a[1]))/2
b

This manages to return an averaged value between the two ranges where 2 numbers are available.

Request Please could someone assist me with generalizing this so that I can apply it to the entire column and accomodate for the "<" and ">" values.

sammywemmy · Accepted Answer

I would suggest using str.extractall to get all the numbers, then get the mean on the first level:

df.EvaRange.str.extractall(r"(\d+)").astype(float).mean(level=0)

         0
0   1000.0
1   1100.0
2   1300.0
3   1400.0

Building on your idea of strip and split:

(df.EvaRange
 .str.strip("<> mm")
 .str.split("-")
 .explode()
 .astype(float)
 .mean(level=0)
 )

0    1000.0
1    1100.0
2    1300.0
3    1400.0
Name: EvaRange, dtype: float64

how to apply .strip().split() function to an entire column in a Pandas dataframe

Answers (2)

Related Questions