Reputation: 67290
I'm searching for UUIDs in blocks of text using a regex. Currently I'm relying on the assumption that all UUIDs will follow a patttern of 8-4-4-4-12 hexadecimal digits.
Can anyone think of a use case where this assumption would be invalid and would cause me to miss some UUIDs?
Upvotes: 403
Views: 468698
Reputation: 450
The pattern I wrote in JavaScript for me is:
re = /^[\da-f]{8}-([\da-f]{4}-){3}[\da-f]{12}$/i;
This is smaller than the examples above and should work fine.
Note: Don't use the examples that use \w
because they are less strict and might match improper uuid's.
Upvotes: 1
Reputation: 19668
I just want to share the smallest regexp way to do the same of the good answers here.
^[a-f\d]{8}(-[a-f\d]{4}){3}-[a-f\d]{12}$
Please use with ignore case flag i
for ignore case / case unsensitive:
const pattern = /^[a-f\d]{8}(-[a-f\d]{4}){3}-[a-f\d]{12}$/i // JavaScript
pattern = re.compile(r"^[a-f\d]{8}(-[a-f\d]{4}){3}-[a-f\d]{12}$", re.IGNORECASE) # Python
$pattern = '/^[a-f\d]{8}(-[a-f\d]{4}){3}-[a-f\d]{12}$/i' // php
Upvotes: 2
Reputation: 9308
If using POSIX regex (grep -E
, MySQL, etc.), this may be easier to read & remember:
[[:xdigit:]]{8}(-[[:xdigit:]]{4}){3}-[[:xdigit:]]{12}
Perl & PCRE flavours also support POSIX character classes so that'll work with them. For those, change the (…)
to a non-capturing subgroup (?:…)
.
JavaScript (and other syntaxes that support Unicode properties) can use a similarly legible version:
/\p{Hex_Digit}{8}(?:-\p{Hex_Digit}{4}){3}-\p{Hex_Digit}{12}/u
Upvotes: 11
Reputation: 13289
The regex for uuid is:
[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}
If you want to enforce the full string to match this regex, you will sometimes (your matcher API may have a method) need to surround above expression with ^...$
, that is
^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$
Upvotes: 738
Reputation: 1290
Official uuid library uses following regex:
/^(?:[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}|00000000-0000-0000-0000-000000000000)$/i
See reference
Upvotes: 1
Reputation: 318
Wanted to give my contribution, as my regex cover all cases from OP and correctly group all relevant data on the group method (you don't need to post process the string to get each part of the uuid, this regex already get it for you)
([\d\w]{8})-?([\d\w]{4})-?([\d\w]{4})-?([\d\w]{4})-?([\d\w]{12})|[{0x]*([\d\w]{8})[0x, ]{4}([\d\w]{4})[0x, ]{4}([\d\w]{4})[0x, {]{5}([\d\w]{2})[0x, ]{4}([\d\w]{2})[0x, ]{4}([\d\w]{2})[0x, ]{4}([\d\w]{2})[0x, ]{4}([\d\w]{2})[0x, ]{4}([\d\w]{2})[0x, ]{4}([\d\w]{2})[0x, ]{4}([\d\w]{2})
Upvotes: 0
Reputation: 3913
Here is the working REGEX: https://www.regextester.com/99148
const regex = [0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}
Upvotes: 9
Reputation: 6018
For bash:
grep -E "[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89aAbB][a-f0-9]{3}-[a-f0-9]{12}"
For example:
$> echo "f2575e6a-9bce-49e7-ae7c-bff6b555bda4" | grep -E "[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89aAbB][a-f0-9]{3}-[a-f0-9]{12}"
f2575e6a-9bce-49e7-ae7c-bff6b555bda4
Upvotes: 5
Reputation: 71
$UUID_RE = join '-', map { "[0-9a-f]{$_}" } 8, 4, 4, 4, 12;
BTW, allowing only 4 on one of the positions is only valid for UUIDv4. But v4 is not the only UUID version that exists. I have met v1 in my practice as well.
Upvotes: 3
Reputation: 17392
/^[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89AB][0-9a-f]{3}-[0-9a-f]{12}$/i
Gajus' regexp rejects UUID V1-3 and 5, even though they are valid.
Upvotes: 43
Reputation: 70339
In python re, you can span from numberic to upper case alpha. So..
import re
test = "01234ABCDEFGHIJKabcdefghijk01234abcdefghijkABCDEFGHIJK"
re.compile(r'[0-f]+').findall(test) # Bad: matches all uppercase alpha chars
## ['01234ABCDEFGHIJKabcdef', '01234abcdef', 'ABCDEFGHIJK']
re.compile(r'[0-F]+').findall(test) # Partial: does not match lowercase hex chars
## ['01234ABCDEF', '01234', 'ABCDEF']
re.compile(r'[0-F]+', re.I).findall(test) # Good
## ['01234ABCDEF', 'abcdef', '01234abcdef', 'ABCDEF']
re.compile(r'[0-f]+', re.I).findall(test) # Good
## ['01234ABCDEF', 'abcdef', '01234abcdef', 'ABCDEF']
re.compile(r'[0-Fa-f]+').findall(test) # Good (with uppercase-only magic)
## ['01234ABCDEF', 'abcdef', '01234abcdef', 'ABCDEF']
re.compile(r'[0-9a-fA-F]+').findall(test) # Good (with no magic)
## ['01234ABCDEF', 'abcdef', '01234abcdef', 'ABCDEF']
That makes the simplest Python UUID regex:
re_uuid = re.compile("[0-F]{8}-([0-F]{4}-){3}[0-F]{12}", re.I)
I'll leave it as an exercise to the reader to use timeit to compare the performance of these.
Enjoy. Keep it Pythonic™!
NOTE: Those spans will also match :;<=>?@'
so, if you suspect that could give you false positives, don't take the shortcut. (Thank you Oliver Aubert for pointing that out in the comments.)
Upvotes: 10
Reputation: 6900
If you want to check or validate a specific UUID version, here are the corresponding regexes.
Note that the only difference is the version number, which is explained in
4.1.3. Version
chapter of UUID 4122 RFC.
The version number is the first character of the third group : [VERSION_NUMBER][0-9A-F]{3}
:
UUID v1 :
/^[0-9A-F]{8}-[0-9A-F]{4}-[1][0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i
UUID v2 :
/^[0-9A-F]{8}-[0-9A-F]{4}-[2][0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i
UUID v3 :
/^[0-9A-F]{8}-[0-9A-F]{4}-[3][0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i
UUID v4 :
/^[0-9A-F]{8}-[0-9A-F]{4}-[4][0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i
UUID v5 :
/^[0-9A-F]{8}-[0-9A-F]{4}-[5][0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i
Upvotes: 158
Reputation: 25476
For UUID generated on OS X with uuidgen
, the regex pattern is
[A-F0-9]{8}-[A-F0-9]{4}-4[A-F0-9]{3}-[89AB][A-F0-9]{3}-[A-F0-9]{12}
Verify with
uuidgen | grep -E "[A-F0-9]{8}-[A-F0-9]{4}-4[A-F0-9]{3}-[89AB][A-F0-9]{3}-[A-F0-9]{12}"
Upvotes: 6
Reputation: 1076
[\w]{8}(-[\w]{4}){3}-[\w]{12}
has worked for me in most cases.
Or if you want to be really specific [\w]{8}-[\w]{4}-[\w]{4}-[\w]{4}-[\w]{12}
.
Upvotes: 24
Reputation: 4798
Variant for C++:
#include <regex> // Required include
...
// Source string
std::wstring srcStr = L"String with GIUD: {4d36e96e-e325-11ce-bfc1-08002be10318} any text";
// Regex and match
std::wsmatch match;
std::wregex rx(L"(\\{[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12}\\})", std::regex_constants::icase);
// Search
std::regex_search(srcStr, match, rx);
// Result
std::wstring strGUID = match[1];
Upvotes: 6
Reputation: 73808
Version 4 UUIDs have the form xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx where x is any hexadecimal digit and y is one of 8, 9, A, or B. e.g. f47ac10b-58cc-4372-a567-0e02b2c3d479.
source: http://en.wikipedia.org/wiki/Uuid#Definition
Therefore, this is technically more correct:
/[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89aAbB][a-f0-9]{3}-[a-f0-9]{12}/
Upvotes: 144
Reputation: 5542
So, I think Richard Bronosky actually has the best answer to date, but I think you can do a bit to make it somewhat simpler (or at least terser):
re_uuid = re.compile(r'[0-9a-f]{8}(?:-[0-9a-f]{4}){3}-[0-9a-f]{12}', re.I)
Upvotes: 7
Reputation: 2064
@ivelin: UUID can have capitals. So you'll either need to toLowerCase() the string or use:
[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}
Would have just commented this but not enough rep :)
Upvotes: 184
Reputation: 19142
I agree that by definition your regex does not miss any UUID. However it may be useful to note that if you are searching especially for Microsoft's Globally Unique Identifiers (GUIDs), there are five equivalent string representations for a GUID:
"ca761232ed4211cebacd00aa0057b223"
"CA761232-ED42-11CE-BACD-00AA0057B223"
"{CA761232-ED42-11CE-BACD-00AA0057B223}"
"(CA761232-ED42-11CE-BACD-00AA0057B223)"
"{0xCA761232, 0xED42, 0x11CE, {0xBA, 0xCD, 0x00, 0xAA, 0x00, 0x57, 0xB2, 0x23}}"
Upvotes: 44
Reputation: 31280
By definition, a UUID is 32 hexadecimal digits, separated in 5 groups by hyphens, just as you have described. You shouldn't miss any with your regular expression.
http://en.wikipedia.org/wiki/Uuid#Definition
Upvotes: 9