Reputation: 1127
There are a lot of similar questions, but I have not found a solution for my problem. I have a data frame with the following structure/form:
col_1
0 BULKA TARTA 500G KAJO 1
1 CUKIER KRYSZTAL 1KG KSC 4
2 KASZA JĘCZMIENNA 4*100G 2 0.92
3 LEWIATAN MAKARON WSTĄŻKA 1 0.89
However, I want to achieve the effect:
col_1
0 BULKA TARTA 500G KAJO
1 CUKIER KRYSZTAL 1KG KSC
2 KASZA JĘCZMIENNA 4*100G
3 LEWIATAN MAKARON WSTĄŻKA
So I want to remove the independent natural and decimal numbers, but leave the numbers in the string with the letters.
I tried to use df.col_1.str.isdigit().replace([True, False],[np.nan, df.col_1])
, but it only works on comparing the entire cell whether it is a number or not.
You have some ideas how to do it? Or maybe it would be good to break the column with spaces and then compare?
Upvotes: 0
Views: 927
Reputation: 18906
We could create a function that tries to convert to float. If it fails we return True (not_float)
import pandas as pd
df = pd.DataFrame({"col_1" : ["BULKA TARTA 500G KAJO 1",
"CUKIER KRYSZTAL 1KG KSC 4",
"KASZA JĘCZMIENNA 4*100G 2 0.92",
"LEWIATAN MAKARON WSTĄŻKA 1 0.89"]})
def is_not_float(string):
try:
float(string)
return False
except ValueError: # String is not a number
return True
df["col_1"] = df["col_1"].apply(lambda x: [i for i in x.split(" ") if is_not_float(i)])
df
Or following the example of my fellow SO:ers. However this would treat 130. as a number.
df["col_1"] = (df["col_1"].apply(
lambda x: [i for i in x.split(" ") if not i.replace(".","").isnumeric()]))
Returns
col_1
0 [BULKA, TARTA, 500G, KAJO]
1 [CUKIER, KRYSZTAL, 1KG, KSC]
2 [KASZA, JĘCZMIENNA, 4*100G]
3 [LEWIATAN, MAKARON, WSTĄŻKA]
Upvotes: 1
Reputation: 287
Yes you can
def no_nums(col):
return ' '.join(filter(lambda word:word.replace('.','').isdigit()==False, col.split()))
df.col_1.apply(no_nums)
This filters out words from each value which are completely made of digits,
And maybe contains a decimal point.
If you want to filter out numbers like 1,000
, simply add another replace
for ','
Upvotes: 0
Reputation: 5632
Sure,
You could use a regex.
import re
df.col_1 = re.sub("\d+\.?\d+?", "", df.col_1)
Upvotes: 1