Zito Relova
Zito Relova

Reputation: 1041

Leave missing values blank when creating a new column in a dataframe

I have a dataframe with a description column and I am trying to parse out measurements from text in that column.

df['measurements'] = [re.findall('\S+\scm', i) + re.findall('\S+cm', i) for i in df['description'] if i is not None]
#...

Some of the rows in the description column are empty so the code above gives me a ValueError because the length of values doesn't match the length of the index. How do I append a filler value like NaN if the row is empty so that the length of the values matches the length of the index and the new measurements column can be made?

The output would look similar to this: eg.

description                       measurements 
blabla 32cm x 24cm x 12cm blabla  ['32cm', '24cm', '12cm']
NaN                               NaN
18cm x 15cm x 10cm blablabla      ['18cm', '15cm', '10cm']
NaN                               NaN

Upvotes: 1

Views: 843

Answers (1)

jezrael
jezrael

Reputation: 862671

I think you need str.findall what works with Nones perfectly - it return NaN in output:

df['measurements'] = df['description'].str.findall('\S+\scm') + 
                     df['description'].str.findall('\S+cm')

And if need replace Nones to empty lists simpliest is use fillna:

des = df['description'].fillna('')
df['measurements'] = des.str.findall('\S+\scm') + des.str.findall('\S+cm')

Sample:

df = pd.DataFrame({'description':['blabla 32cm x 24cm x 12cm blabla',np.nan,
                                  '18cm x 15cm x 10cm blablabla',np.nan]})
print (df)
                        description
0  blabla 32cm x 24cm x 12cm blabla
1                               NaN
2      18cm x 15cm x 10cm blablabla
3                               NaN

df['measurements'] = df['description'].str.findall('\S+\scm') + \
                     df['description'].str.findall('\S+cm')

print (df)
                        description        measurements
0  blabla 32cm x 24cm x 12cm blabla  [32cm, 24cm, 12cm]
1                               NaN                 NaN
2      18cm x 15cm x 10cm blablabla  [18cm, 15cm, 10cm]
3                               NaN                 NaN

des = df['description'].fillna('')
df['measurements'] = des.str.findall('\S+\scm') + des.str.findall('\S+cm')
print (df)
                        description        measurements
0  blabla 32cm x 24cm x 12cm blabla  [32cm, 24cm, 12cm]
1                               NaN                  []
2      18cm x 15cm x 10cm blablabla  [18cm, 15cm, 10cm]
3                               NaN                  []

Upvotes: 2

Related Questions