Yoss
Yoss

Reputation: 546

how to split a column in dataframe into list of tuple

I found some answers online, but I have no experience with regular expressions, which I believe is what is needed here and if there is another way it would be better.

I have a complexed column in my dataframe that needs to be split by either a ',' ';' '(' ')' ':'

Example string:

(36%) (litopenaaus varmrn ), une chapelure (25%) [vmaaî fmur, water,) sel, soja 0i), sucre, levure), eau. î farine de whca, amidon de mais, sart, cre. regulators (450, 500, stg). soybean [containing an antioxidant (300)]. sucre, powder of gariic, levure, th ci nœ (412). contient des crevettes"

should be split into a list containing the following

["36%", "litopenaaus varmrn", "une chapelure (25%)", ["vmaaî fmur", "water", "sel", "soja 0i", "sucre", "levure"], "eau. î farine de whca", "amidon de mais", "sart", "cre. regulators ["(450, 500, stg)"]. soybean [containing an antioxidant (300)]. sucre", "powder of gariic", "levure"," th ci nœ (412). contient des crevettes"]

The code I have written to do this looks like this but nothing happend:

delimiters = ",", ":", "(", ")", ";"
regexPattern = '|'.join(map(re.escape, delimiters))

df['splited'] = df.ingredient.apply(lambda row: ' '.join((re.split(regexPattern, str(row)))))

Upvotes: 0

Views: 26

Answers (1)

Daweo
Daweo

Reputation: 36390

By doing

delimiters = ",", ":", "(", ")", ";"
regexPattern = '|'.join(map(re.escape, delimiters))

df['splited'] = df.ingredient.apply(lambda row: ' '.join((re.split(regexPattern, str(row)))))

you actually splitted (re.split) and then joined created parts using space character (' '.join), if you need parts list rather than single new string simply do not join them, i.e.

df['splited'] = df.ingredient.apply(lambda row: re.split(regexPattern, str(row)))

Upvotes: 1

Related Questions