Reputation: 1424
I am trying to make regular expression for a String
shown blow coming in via TCP/IP socket.
$AVSYS,99999999,V1.17,SN0000103,32768*16
Each string should start with dollar sign $
and capital alpha character between 5 and 6 digits. They end with *
and 2 digits alphanumeric check-sum. Each fields is separated by comma ,
and can be any string.
I created a regular expression for it.
^\$[A-Z]{5,6}(\,.*)(\,.*)(\,.*)(\,.*)(\,.*)\*[\d\w]{2}$
I expected that it would match but it did not. I am not still familiar with regular expression even though I have read through Java doc. Please help for me to get correct regular expression.
I tried these two after I fixed my regular expression according to replies.
^\$[A-Z]{5,6}(\,.*)(\,.*)(\,.*)(\,.*)(\,.*)\*[\d\w]{2}$
^\$[A-Z]{5,6}(\,.*?)(\,.*?)(\,.*?)(\,.*?)\*[\d\w]{2}$
But I got the more result than I expected.
$AVSYS,99999999,V1.17,SN0000103,32768*16
$AVRMC,80000551,144811,A,4351.3789,N,07923.4712,W,0.00,153.45,091107,A,,161,1*64
$AVRMC,80000551,144811,A,4351.3789,N,07923.4712,W,0.00,153.45,091107,A,,161,1,0,0*64
$EAVSYS,99999999,12345678901234567890,9057621228,,,*0B
Above sentences all match with the regexp.But I want to get just 1). How can I achieve this?
Upvotes: 1
Views: 186
Reputation: 3994
Your regex has an extra sub-group. This should work:
^\$[A-Z]{5,6}(\,.*)(\,.*)(\,.*)(\,.*)\*[\d\w]{2}$
On a side note, this should also work, but with a better efficiency as it would eliminate a lot of back-tracking; the '?' added makes the regex match in a non-greedy fashion.
^\$[A-Z]{5,6}(\,.*?)(\,.*?)(\,.*?)(\,.*?)\*[\d\w]{2}$
As to your new edits, you could use the following:
^\$[A-Z]{5,6}(\,[^\,]+?)(\,[^\,]+?)(\,[^\,]+?)(\,[^\,]+?)\*[\d\w]{2}$
i.e.,
Upvotes: 1
Reputation: 91373
To avoid empty group, just replace *
by +
:
^\$[A-Z]{5,6}(,.+?)(,.+?)(,.+?)(,.+?)\*\w{2}$
Thre're no needs to escape the comma and \w
includes \d
, in fact it's equivalent to [a-zA-Z0-9_]
Upvotes: 0
Reputation: 726479
To match any number of comma-separated groups, you can use this expression:
^\$[A-Z]{5,6}(,[^,*]*)*\*[\da-zA-Z]{2}$
The data portion is matched by this expression:
(,[^,*]*)*
It matches zero or more groups of characters that start in comma, and followed by any number of characters other than a comma or an asterisk. Once a comma or an asterisk is reached, the expression engine checks if it's a new value or the check sum at the end.
If the check sum does not allow lowercase letters, replace a-zA-Z
with A-Z
.
Upvotes: 1