Reputation: 53
I have a column that looks like this:
col1
20.5
21.2
21.2
17315/06/2021 09:06:481032.14310134.91082996.3001047998.93380132341231
0060232346956263174
$365140110030
$36516011007C27
$3651501100E743
I want that only the floating point values will remain in the column I've tried various replace methods, no luck:
df['col1'] = df['col1'].replace(r'/ [ ^\d.] / g', '', regex=True, inplace=False)
appears that it doesn't do anything
or
df['Temp'] = df['Temp'].replace(r'/ [ ^\d.] / g', '', regex=True, inplace=True)
places all values as NaN
Upvotes: 0
Views: 1670
Reputation: 25544
an option could be to find all "digit-dot-digit" sequences in all the column's elements, and cast to numeric if there is exactly one match:
import pandas as pd
df = pd.DataFrame({"col1": [
20.5,
21.2,
21.2,
"17315/06/2021 09:06:481032.14310134.91082996.3001047998.93380132341231",
"0060232346956263174",
"$365140110030",
"$36516011007C27",
"$3651501100E743",
"This is a cell with a float 5.4",
-50.0 ]})
# with an apply/lambda
# df['floats'] = df['col1'].astype(str).str.findall("\-?\d+\.\d+").apply(lambda x: pd.to_numeric(*x) if len(x)==1 else None)
# you can also avoid the apply/lambda with a temporary series:
s = df['col1'].astype(str).str.findall("\-?\d+\.\d+")
df['floats'] = pd.to_numeric(s[s.str.len() == 1].str[0])
print(df['floats'])
0 20.5
1 21.2
2 21.2
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 5.4
9 -50.0
Name: floats, dtype: float64
Upvotes: 1