Reputation: 546
I found some answers online, but I have no experience with regular expressions, which I believe is what is needed here and if there is another way it would be better.
I have a complexed column in my dataframe that needs to be split by either a ',' ';' '(' ')' ':'
Example string:
(36%) (litopenaaus varmrn ), une chapelure (25%) [vmaaî fmur, water,) sel, soja 0i), sucre, levure), eau. î farine de whca, amidon de mais, sart, cre. regulators (450, 500, stg). soybean [containing an antioxidant (300)]. sucre, powder of gariic, levure, th ci nœ (412). contient des crevettes"
should be split into a list containing the following
["36%", "litopenaaus varmrn", "une chapelure (25%)", ["vmaaî fmur", "water", "sel", "soja 0i", "sucre", "levure"], "eau. î farine de whca", "amidon de mais", "sart", "cre. regulators ["(450, 500, stg)"]. soybean [containing an antioxidant (300)]. sucre", "powder of gariic", "levure"," th ci nœ (412). contient des crevettes"]
The code I have written to do this looks like this but nothing happend:
delimiters = ",", ":", "(", ")", ";"
regexPattern = '|'.join(map(re.escape, delimiters))
df['splited'] = df.ingredient.apply(lambda row: ' '.join((re.split(regexPattern, str(row)))))
Upvotes: 0
Views: 26
Reputation: 36390
By doing
delimiters = ",", ":", "(", ")", ";"
regexPattern = '|'.join(map(re.escape, delimiters))
df['splited'] = df.ingredient.apply(lambda row: ' '.join((re.split(regexPattern, str(row)))))
you actually splitted (re.split
) and then joined created parts using space character (' '.join
), if you need parts list rather than single new string simply do not join them, i.e.
df['splited'] = df.ingredient.apply(lambda row: re.split(regexPattern, str(row)))
Upvotes: 1