Reputation: 1041
I have a dataframe with a description column and I am trying to parse out measurements from text in that column.
df['measurements'] = [re.findall('\S+\scm', i) + re.findall('\S+cm', i) for i in df['description'] if i is not None]
#...
Some of the rows in the description column are empty so the code above gives me a ValueError because the length of values doesn't match the length of the index. How do I append a filler value like NaN if the row is empty so that the length of the values matches the length of the index and the new measurements column can be made?
The output would look similar to this: eg.
description measurements
blabla 32cm x 24cm x 12cm blabla ['32cm', '24cm', '12cm']
NaN NaN
18cm x 15cm x 10cm blablabla ['18cm', '15cm', '10cm']
NaN NaN
Upvotes: 1
Views: 843
Reputation: 862671
I think you need str.findall
what works with None
s perfectly - it return NaN
in output:
df['measurements'] = df['description'].str.findall('\S+\scm') +
df['description'].str.findall('\S+cm')
And if need replace None
s to empty lists simpliest is use fillna
:
des = df['description'].fillna('')
df['measurements'] = des.str.findall('\S+\scm') + des.str.findall('\S+cm')
Sample:
df = pd.DataFrame({'description':['blabla 32cm x 24cm x 12cm blabla',np.nan,
'18cm x 15cm x 10cm blablabla',np.nan]})
print (df)
description
0 blabla 32cm x 24cm x 12cm blabla
1 NaN
2 18cm x 15cm x 10cm blablabla
3 NaN
df['measurements'] = df['description'].str.findall('\S+\scm') + \
df['description'].str.findall('\S+cm')
print (df)
description measurements
0 blabla 32cm x 24cm x 12cm blabla [32cm, 24cm, 12cm]
1 NaN NaN
2 18cm x 15cm x 10cm blablabla [18cm, 15cm, 10cm]
3 NaN NaN
des = df['description'].fillna('')
df['measurements'] = des.str.findall('\S+\scm') + des.str.findall('\S+cm')
print (df)
description measurements
0 blabla 32cm x 24cm x 12cm blabla [32cm, 24cm, 12cm]
1 NaN []
2 18cm x 15cm x 10cm blablabla [18cm, 15cm, 10cm]
3 NaN []
Upvotes: 2