reyman64
reyman64

Reputation: 553

Regex to match any number AND any characters between quotes

I'm confronted to this weird csv formatting, containing non escaped , character :

   641,"Harstad/Narvik Airport, Evenes","Harstad/Narvik","Norway","EVE","ENEV",68.491302490234,16.678100585938,84,1,"E","Europe/Oslo","airport","OurAirports"  

I need to return a list like this

[641,'Harstad/Narvik Airport Evenes', 'Harstad/Narvik', 'Norway', 'EVE', 'ENEV', 68.491302490234,16.678100585938,84,1, 'E', 'Europe/Oslo', 'airport', 'OurAirports']

I have two regex to match part of the string :

Is there a way to merge the matching into one result ?

Upvotes: 1

Views: 256

Answers (1)

anubhava
anubhava

Reputation: 786359

You may use this regex:

>>> s = '641,"Harstad/Narvik Airport, Evenes","Harstad/Narvik","Norway","EVE","ENEV",68.491302490234,16.678100585938,84,1,"E","Europe/Oslo","airport","OurAirports"'

>>> csvData = re.findall(r'"[^"\\]*(?:\\.[^"\\]*)*"|\d+(?:\.\d+)?', s)
>>> print csvData

['641', '"Harstad/Narvik Airport, Evenes"', '"Harstad/Narvik"', '"Norway"', '"EVE"', '"ENEV"', '68.491302490234', '16.678100585938', '84', '1', '"E"', '"Europe/Oslo"', '"airport"', '"OurAirports"']

RegEx Details:

  • "[^"\\]*(?:\\.[^"\\]*)*": Match a quoted string that allows escaped quotes or any other escaped character inside e.g. "foo\"bar" into a single element
  • |: OR
  • \d+(?:\.\d+)?: Match an integer or a decimal number

Upvotes: 1

Related Questions