How can I extract numbers as well as string from multiple rows in a Data Frame column?

Question

DF1

index|Number
0    |[Number 1]
1    |[Number 2]
2    |[kg]
3    |[]
4    |[kg,Number 3]

In my dataframe in the Number column, I need to extract the number if present, kg if the string has kg and NaN if there is no value. If the row has both the number and kg then I will extract only the number.

Expected Output

index|Number
0    |1
1    |2
2    |kg
3    |NaN
4    |3

I wrote a lambda function for this but I am getting Error

NumorKG = lambda x: x.str.extract('(\d+)') if x.str.extract('(\d+)').isdigit() else 'kg' if x.str.find('kg') else "NaN"

DF1['Number']=DF1['Number'].apply(NumorKG)

The error that I am getting is:

AttributeError: 'str' object has no attribute 'str'

jezrael · Accepted Answer

Use numpy.where for set values:

#extract numeric to Series
d = df['Number'].str.extract('(\d+)', expand=False)
#test if digit
mask1 = d.str.isdigit().fillna(False)

#test if values contains kg mask2 = df['Number'].str.contains('kg', na=False)

df['Number'] = np.where(mask1, d, 
               np.where(mask2 & ~mask1, 'kg',np.nan))
print (df)
  Number
0      1
1      2
2     kg
3    nan
4      3

Your solution should be changed:

import re

def NumorKG(x):
    a = re.findall('(\d+)', x)
    if len(a) > 0:
        return a[0]
    elif 'kg' in x:
        return 'kg'

    else:
        return np.nan

df['Number']=df['Number'].apply(NumorKG)
print (df)
  Number
0      1
1      2
2     kg
3    NaN
4      3

And your lambda function should be changed:

NumorKG = lambda x: re.findall('(\d+)', x)[0] 
                    if len(re.findall('(\d+)', x)) > 0 
                    else 'kg' 
                    if 'kg' in x 
                    else np.nan

How can I extract numbers as well as string from multiple rows in a Data Frame column?

Answers (2)

Related Questions