Reputation: 3615
I need regex to parse items in text.
Structure of data is:
I am using this regex:
.*\n(.+) (AA|BB|CC|DD|EE|[, ]+){0,6}(\d+).*
With this text string:
Sveiki,
I need these items:
1508-dkh-ą9 AA, BB 100
1efae 468 BB, CC 100
2efae 468 BB 100
3efae 468 100
Ačiū už dėmesį ir skirtą laiką!
It returns
<row>
<ID>0</ID>
<Match>1508-dkh-Ä…9 AA, BB 100</Match>
<Group1>1508-dkh-Ä…9 AA, BB</Group1>
<Group2></Group2>
<Group3>100</Group3>
</row>
<row>
<ID>1</ID>
<Match>1efae 468 BB, CC 100</Match>
<Group1>1efae 468 BB, CC</Group1>
<Group2></Group2>
<Group3>100</Group3>
</row>
<row>
<ID>2</ID>
<Match>2efae 468 BB 100</Match>
<Group1>2efae 468 BB</Group1>
<Group2></Group2>
<Group3>100</Group3>
</row>
<row>
<ID>3</ID>
<Match>3efae 468 100</Match>
<Group1>3efae 468</Group1>
<Group2></Group2>
<Group3>100</Group3>
</row>
And I need result like this
<row>
<ID>0</ID>
<Match>1508-dkh-Ä…9 AA, BB 100</Match>
<Group1>1508-dkh-Ä…9</Group1>
<Group2>AA, BB</Group2>
<Group3>100</Group3>
</row>
....
How can I achieve this result? (maybe there is better solution than regex?)
Upvotes: 0
Views: 231
Reputation: 2297
Try this (you might need to modify it a bit depending on the language you are using)
^(.+?)([AA|BB|CC|DD|EE, ]*) ([0-9]+)$
The question mark in the first group will make it lazy, which will prevent it from matching your optional flags as well.
Try it out at http://gskinner.com/RegExr/?375ce
Upvotes: 1
Reputation: 473
The following regex works for the example posted on the question
^(.+?) ((?:AA|BB|CC|DD|EE|[, ]+){0,6})(\d+)$
Upvotes: 1