Regular expression to identify text between semi-colons that contains comma and spaces

Question

I am trying to identify some texts that contains comma(,) and white spaces(\s+) in a csv that is semi-colon(;) separated. Sample csv entries are as followed:

09/03/2023;13;P;1210/2003 (OJ L169);2003-07-08;http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2003:169:0006:0023:EN:PDF;IRQ;(UNSC RESOLUTION 1483);;;;;;;;;;;;;;;;;;;;;;;;;;;14;13;1210/2003 (OJ L169);2003-07-08;http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2003:169:0006:0023:EN:PDF;IRQ;1937-04-28;al-Awja, near Tikrit;IRQ;;;;;;;;;;;;;;;;EU.27.28
09/03/2023;20;P;1210/2003 (OJ L169);2003-07-08;http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2003:169:0006:0023:EN:PDF;IRQ;(Saddam's second son);26;20;1210/2003 (OJ L169);2003-07-08;http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2003:169:0006:0023:EN:PDF;IRQ;Hussein Al-Tikriti;Qusay;Saddam;Qusay Saddam Hussein Al-Tikriti;M;;Oversaw Special Republican Guard, Special Security Organisation, and Republican Guard;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;EU.39.56

In the sample data I am trying to extract following texts:

al-Awja, near Tikrit
Oversaw Special Republican Guard, Special Security Organisation, and Republican Guard

Both the instances of target texts have comma(,) in it and that is creating issue when trying to convert the semi-colon(;) separated file into a comma(,) separated file as it adds extra columns for existing commas(,) in the string.

So far I have following regular expression that is taking me to the required texts. However, I am unable to retrieve entire string using this.

Regex: ([A-Za-z0-9-]+)([,])(\s+)([A-Za-z0-9-]+)

Please help.

Regular expression to identify text between semi-colons that contains comma and spaces

Answers (1)

Related Questions