sunghun
sunghun

Reputation: 1424

How to get regular expression that matches these strings

I am trying to make regular expression for a String shown blow coming in via TCP/IP socket.

$AVSYS,99999999,V1.17,SN0000103,32768*16

Each string should start with dollar sign $ and capital alpha character between 5 and 6 digits. They end with * and 2 digits alphanumeric check-sum. Each fields is separated by comma , and can be any string.

I created a regular expression for it.

^\$[A-Z]{5,6}(\,.*)(\,.*)(\,.*)(\,.*)(\,.*)\*[\d\w]{2}$

I expected that it would match but it did not. I am not still familiar with regular expression even though I have read through Java doc. Please help for me to get correct regular expression.

Edited

I tried these two after I fixed my regular expression according to replies.

^\$[A-Z]{5,6}(\,.*)(\,.*)(\,.*)(\,.*)(\,.*)\*[\d\w]{2}$
^\$[A-Z]{5,6}(\,.*?)(\,.*?)(\,.*?)(\,.*?)\*[\d\w]{2}$

But I got the more result than I expected.

$AVSYS,99999999,V1.17,SN0000103,32768*16
$AVRMC,80000551,144811,A,4351.3789,N,07923.4712,W,0.00,153.45,091107,A,,161,1*64
$AVRMC,80000551,144811,A,4351.3789,N,07923.4712,W,0.00,153.45,091107,A,,161,1,0,0*64
$EAVSYS,99999999,12345678901234567890,9057621228,,,*0B

Above sentences all match with the regexp.But I want to get just 1). How can I achieve this?

Upvotes: 1

Views: 186

Answers (3)

Roney Michael
Roney Michael

Reputation: 3994

Your regex has an extra sub-group. This should work:

^\$[A-Z]{5,6}(\,.*)(\,.*)(\,.*)(\,.*)\*[\d\w]{2}$

On a side note, this should also work, but with a better efficiency as it would eliminate a lot of back-tracking; the '?' added makes the regex match in a non-greedy fashion.

^\$[A-Z]{5,6}(\,.*?)(\,.*?)(\,.*?)(\,.*?)\*[\d\w]{2}$

As to your new edits, you could use the following:

^\$[A-Z]{5,6}(\,[^\,]+?)(\,[^\,]+?)(\,[^\,]+?)(\,[^\,]+?)\*[\d\w]{2}$

i.e.,

  1. Replace the '.' with '[^\,]' to match any character except a comma rather than just any character at all; and
  2. Replace '*' with '+' to avoid matches of length zero.

Upvotes: 1

Toto
Toto

Reputation: 91373

To avoid empty group, just replace * by +:

^\$[A-Z]{5,6}(,.+?)(,.+?)(,.+?)(,.+?)\*\w{2}$

Thre're no needs to escape the comma and \w includes \d, in fact it's equivalent to [a-zA-Z0-9_]

Upvotes: 0

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726479

To match any number of comma-separated groups, you can use this expression:

^\$[A-Z]{5,6}(,[^,*]*)*\*[\da-zA-Z]{2}$

The data portion is matched by this expression:

(,[^,*]*)*

It matches zero or more groups of characters that start in comma, and followed by any number of characters other than a comma or an asterisk. Once a comma or an asterisk is reached, the expression engine checks if it's a new value or the check sum at the end.

If the check sum does not allow lowercase letters, replace a-zA-Z with A-Z.

Upvotes: 1

Related Questions