how to split a column in dataframe into list of tuple

Question

I found some answers online, but I have no experience with regular expressions, which I believe is what is needed here and if there is another way it would be better.

I have a complexed column in my dataframe that needs to be split by either a ',' ';' '(' ')' ':'

Example string:

(36%) (litopenaaus varmrn ), une chapelure (25%) [vmaaî fmur, water,) sel, soja 0i), sucre, levure), eau. î farine de whca, amidon de mais, sart, cre. regulators (450, 500, stg). soybean [containing an antioxidant (300)]. sucre, powder of gariic, levure, th ci nœ (412). contient des crevettes"

should be split into a list containing the following

["36%", "litopenaaus varmrn", "une chapelure (25%)", ["vmaaî fmur", "water", "sel", "soja 0i", "sucre", "levure"], "eau. î farine de whca", "amidon de mais", "sart", "cre. regulators ["(450, 500, stg)"]. soybean [containing an antioxidant (300)]. sucre", "powder of gariic", "levure"," th ci nœ (412). contient des crevettes"]

The code I have written to do this looks like this but nothing happend:

delimiters = ",", ":", "(", ")", ";"
regexPattern = '|'.join(map(re.escape, delimiters))

df['splited'] = df.ingredient.apply(lambda row: ' '.join((re.split(regexPattern, str(row)))))

Daweo · Accepted Answer

By doing

delimiters = ",", ":", "(", ")", ";"
regexPattern = '|'.join(map(re.escape, delimiters))

df['splited'] = df.ingredient.apply(lambda row: ' '.join((re.split(regexPattern, str(row)))))

you actually splitted (re.split) and then joined created parts using space character (' '.join), if you need parts list rather than single new string simply do not join them, i.e.

df['splited'] = df.ingredient.apply(lambda row: re.split(regexPattern, str(row)))

how to split a column in dataframe into list of tuple

Answers (1)

Related Questions