Reputation: 49
The final output is to exclude unexpected characters in product title.
Input:
import pandas as pd
df = pd.DataFrame({'product': [['cream', 'with', 'vegetable', 'oil', '(4+2', 'free)', '170', 'g', 'almarai', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN']],
})
Expected output
1. To keep text and number only.
2. To drop everything else
creamwithvegetableoil42free170galmarai
Thanks!
Upvotes: 0
Views: 25
Reputation: 42916
First we use Series.explode
to unnest your list to a row for each element
Then we can use Series.str.extractall
with \w+
and \d+
to get all word characters + numbers out of your column.
Finally we use GroupBy.agg
with ''.join
to concatenate your strings back to each other:
(df['product'].explode()
.str.extractall('(\w+|\d+)')
.replace('NaN', '')
.groupby(level=0).agg(''.join))
0
0 creamwithvegetableoil42free170galmarai
Upvotes: 1