Aby_g
Aby_g

Reputation: 53

how to filter out only float data type from a column in pandas

I have a column that looks like this:

col1
20.5
21.2
21.2
17315/06/2021 09:06:481032.14310134.91082996.3001047998.93380132341231
0060232346956263174
$365140110030
$36516011007C27
$3651501100E743

I want that only the floating point values will remain in the column I've tried various replace methods, no luck:

df['col1'] = df['col1'].replace(r'/ [ ^\d.] / g', '', regex=True, inplace=False)

appears that it doesn't do anything

or

df['Temp'] = df['Temp'].replace(r'/ [ ^\d.] / g', '', regex=True, inplace=True)

places all values as NaN

Upvotes: 0

Views: 1670

Answers (1)

FObersteiner
FObersteiner

Reputation: 25544

an option could be to find all "digit-dot-digit" sequences in all the column's elements, and cast to numeric if there is exactly one match:

import pandas as pd

df = pd.DataFrame({"col1": [
            20.5,
            21.2,
            21.2,
            "17315/06/2021 09:06:481032.14310134.91082996.3001047998.93380132341231",
            "0060232346956263174",
            "$365140110030",
            "$36516011007C27",
            "$3651501100E743",
            "This is a cell with a float 5.4",
            -50.0 ]})

# with an apply/lambda
# df['floats'] = df['col1'].astype(str).str.findall("\-?\d+\.\d+").apply(lambda x: pd.to_numeric(*x) if len(x)==1 else None)

# you can also avoid the apply/lambda with a temporary series:
s = df['col1'].astype(str).str.findall("\-?\d+\.\d+")
df['floats'] = pd.to_numeric(s[s.str.len() == 1].str[0])

print(df['floats'])
0    20.5
1    21.2
2    21.2
3     NaN
4     NaN
5     NaN
6     NaN
7     NaN
8     5.4
9   -50.0
Name: floats, dtype: float64

Upvotes: 1

Related Questions