Search from end of string and split

Question

I am parsing a large DBF file to import into mongodb.

One of the fields in the DBF file is called Description and this is like

WOMEN'S CC CROPPED TOP T-SHIRT - MELANGE GREY - S
WOMEN'S CC CROPPED TOP T-SHIRT - MELANGE GREY - M
WOMEN'S CC CROPPED TOP T-SHIRT - WHITE- L
WOMEN'S CC CROPPED TOP T-SHIRT- WHITE -XL
WOMEN'S CC CROPPED TOP T-SHIRT- WHITE -2XL
WOMEN'S CC CROPPED TOP T-SHIRT- WHITE -3XL
JUNIOR EP ORGANIC T-SHIRT - YELLOW- 3-4 YRS
JUNIOR EP ORGANIC T-SHIRT - YELLOW - 5-6 YRS
EP ORGANIC BIB - PINK -ONE SIZE

What will be the best way to split this so that I have the product name, colour and size?

in most cases, i can do :

try:
  description, colour_name, size = style_meta_attributes['CN_DESC'].split('- ')
  if colour_name not in colour_names:
    colour_names.append(colour_name)
  if size not in sizes_names:
    sizes_names.append(size)
except:
  try:
    description, colour_name, size = style_meta_attributes['CN_DESC'].split(' -')
 ...

for each splits = [' - ', '- ', ' -', ' -', ' - ', '-']

but this does work as when i have T-SHIRT or 3-4 YRS

Any advice much appreciated.

georg · Accepted Answer

Try

re.split(r'\s+-\s*|\s*-\s+', description)

The idea is to require whitespace before OR after the delimiter (or on both sides).

Search from end of string and split

Answers (1)

Related Questions