Reputation: 442
I tried and this is the best I could come up with, need a fresh eye or help to finish this job.
Expression:
\"[a-zA-Z\s0-9\.\']*\"
Input String Data:
BOL,"AWBH0876356","HMM","H0010","BEANR","BEANR","AEJEA","BHBAH","","","T","S","","","F","N","","FCL/FCL","BE","","","","","","","","SUNNYLAND DISTRIBUTION NV","EVERDONGENLAAN 12 2300 TURNHOUT","","","INTERNATIONAL AGENCIES CO LTD","BUILDING 406, ROAD 4308, BLOCK 343, MANAMA BAHRAIN","","INTERNATIONAL AGENCIES CO LTD","BUILDING 406, ROAD 4308, BLOCK 343, MANAMA BAHRAIN","","","","","","","N/A","770000",""SHIPPER'S LOAD & COUNT, SAID TO BE:" 1X20'DC CONTAINER S.T.C 1650 CARTON OF JUICES FREIGHT PREPAID","1650","CARTONS","CTN","","","1","1","2.2","21615.0","23815.0","0","0","0","0","","",""
I need to ignore the first word (BOL
) and comma, which is working, but I am stuck with matches which have special character ('
,"
) in them.
Below match is an issue, for example:
""SHIPPER'S LOAD & COUNT, SAID TO BE:" 1X20'DC CONTAINER S.T.C 1650 CARTON OF JUICES FREIGHT PREPAID"
Upvotes: 0
Views: 509
Reputation: 71538
The current problem with your regex (and the string to be parsed) is that it doesn't accept that there is a quote inside a value and the string has. Maybe, you could specify that there can be a quote inside, but then, the closing quote should only be before a comma or at the end of the string, and you can do that with a positive lookhead:
".*?"(?=,|$)
".*?"
matches the value while (?=,|$)
ensures there's either a comma after it or the end of the string (indicated by $
).
Note that if your string has value containing a quote character followed by a comma, the above regex won't work.
In that case, what I usually do is count the number of matches. If this exceeds the number I expect, I put the original line separately so I can look at them one by one (that will involve some manual intervention, but that's better than ending up with lots of errors!).
If all the problems come from a single column, then you could change your script so that it will 'merge' values from i to j (i being the column number where the first issue occurs and j being the next one) until there's an appropriate number of values.
Upvotes: 2
Reputation: 67968
(?:^|(?<=,))"(?!,")(.+?)"(?=,"|$)
Try this.See demo.
https://regex101.com/r/tJ2mW5/4
Upvotes: 0