Reputation: 3520
I am parsing a large DBF file to import into mongodb.
One of the fields in the DBF file is called Description
and this is like
WOMEN'S CC CROPPED TOP T-SHIRT - MELANGE GREY - S
WOMEN'S CC CROPPED TOP T-SHIRT - MELANGE GREY - M
WOMEN'S CC CROPPED TOP T-SHIRT - WHITE- L
WOMEN'S CC CROPPED TOP T-SHIRT- WHITE -XL
WOMEN'S CC CROPPED TOP T-SHIRT- WHITE -2XL
WOMEN'S CC CROPPED TOP T-SHIRT- WHITE -3XL
JUNIOR EP ORGANIC T-SHIRT - YELLOW- 3-4 YRS
JUNIOR EP ORGANIC T-SHIRT - YELLOW - 5-6 YRS
EP ORGANIC BIB - PINK -ONE SIZE
What will be the best way to split this so that I have the product name
, colour
and size
?
in most cases, i can do :
try:
description, colour_name, size = style_meta_attributes['CN_DESC'].split('- ')
if colour_name not in colour_names:
colour_names.append(colour_name)
if size not in sizes_names:
sizes_names.append(size)
except:
try:
description, colour_name, size = style_meta_attributes['CN_DESC'].split(' -')
...
for each splits = [' - ', '- ', ' -', ' -', ' - ', '-']
but this does work as when i have T-SHIRT
or 3-4 YRS
Any advice much appreciated.
Upvotes: 1
Views: 64
Reputation: 214949
Try
re.split(r'\s+-\s*|\s*-\s+', description)
The idea is to require whitespace before OR after the delimiter (or on both sides).
Upvotes: 2