Reputation: 68466
I am trying to match ticker symbols that have the following format:
Part1: A market identifier code (MIC) which is used to specify exchange on which the securities are traded. The code is unique, includes four characters, and starts with X, followed by a three-digit code to signify the market, such as XNAS for the Nasdaq market.
Part1 is separated from Part2 by a colon.
Part2: A ticker code which has two parts: (a) The security code which is typically anything from 1 char (F for Ford), to 5 chars (VFIAX for the Vanguard 500 Index). Th (b) An optional part which can be further split into (i) Expiration date, 6 digits in the format yymmdd (ii) Option type, either P or C, for put or call (iii) Strike price, as the price x 1000, front padded with 0s to 8 digits
A gotcha that I need to handle is that when the optional part is present (sometimes) the security code is padded with spaces to 6 characters.
So I need to match the following valid tickers:
XLON:SBRY
XNAS:TSLA
XCME:SPX 141122P00019500
XNAS:AAPL200918C00032500
My regexfu is not great, and this is what I've managed to come up with so far:
^(X)(A-Z){3}(:)(\d|[A-Z]){1,6}\s
What is the correct regex that matches all of the above valid ticker symbols and matches the parts correctly?
Upvotes: 0
Views: 2316
Reputation: 163457
You could get the matches using:
^X[A-Z]{3}:[A-Z]{1,5}(?:\s*\d{6}[PC]\d{8})?$
A bit more precise match for the month/day and the security code padded with spaces to 6 characters could be:
^X[A-Z]{3}:[A-Z]{1,5}(?: {0,6}\d{2}(?:0[1-9]|1[012])(?:0[1-9]|[12][0-9]|3[01])[PC]\d{8})?$
Explanation
^
Start of stringX[A-Z]{3}:
Match X, 3 chars A-Z and :
[A-Z]{1,5}
Match 1-5 times A-Z(?:
Non capture group
{0,6}\d{2}
Match 0-6 spaces and 2 digits(?:0[1-9]|1[012])
Match a month part 01 - 12(?:0[1-9]|[12][0-9]|3[01])
Match a day part 01 - 31[PC]\d{8}
Match P
or C
and 8 digits)?
Close group and make it optional$
End of stringUpvotes: 2