Reputation: 1564
I am using UA-Parser to create a table of devices for analytics...I have a csv of user-agent strings from our server. I am using the stock UA-Parser for Node package (ua-parser-js.)
However, I am having difficulty parsing some Droid user-agent strings.
Current Regex for Droid is
/\s((milestone|droid[2x]?))[globa\s]*\sbuild\//i
The above matches
Mozilla/5.0 (Linux; U; Android 2.3.4; en-us; DROIDX Build/4.5.1_57_DX8-51) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1,182
But does not match
Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; DROID RAZR Build/9.8.2O-72_VZW-16) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30,652
Mozilla/5.0 (Linux; U; Android 2.3.5; en-us; DROID X2 Build/4.5.1A-DTN-200-18) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1,152
How should modify the regex to filter the above strings?
Upvotes: 0
Views: 260
Reputation: 168715
To solve this we need to isolate the part of the string that is causing us a problem.
Let's cut the strings down and only at the part of the strings that we're interested in:
DROIDX Build
compared with DROID RAZR Build
or DROID X2 Build
We can see that they all match the droid
, and the [2x]
is optional, so that doesn't matter.
The problem is in the next bit: [globa\s]
.
This is not optional, and requires that immediately after the word droid
(with or without a following 2
or X
), we have one or more of the characters in this list g
,l
,o
,b
,a
, or a white space.
We have RAZR
and X2
in the failing strings. If any of the characters in those words are not in the above list, then the match fails. (As it turns out, almost none of the characters are in the list, but it would fail for a single one).
So the quick and easy fix here is to add the characters r
,z
,x
and 2
to the globa\s
.
This will fix it for the given examples -- ie it will now accept the RAZR
or X2
in this section of the string.
However, to allow for other possible cases, you may want to be a bit more lenient and allow any alpha-numeric characters. It's up to you, but there's no predicting what UA strings are going to appear in the future.
So therefore, I would suggest replacing the whole globa
but with a-z0-9
.
/\s((milestone|droid[2x]?))[a-z0-9\s]*\sbuild\//i
Even this may not pick up all possible variants that could appear, but that's the trouble with user agent strings; they're not exactly a well-defined format; they can contain pretty much anything.
[EDIT] The OP adds a request for the RAZR
or X2
strings to be included in the returned result string.
The short answer is that this would mean moving the relevant part of the pattern into the bracketed section, alongside the droid
pattern.
However, this does complicate things, because while we want those strings to be included, we may not want others which were previously excluded -- ie the strings that previously matched the globa\s
pattern. The problem here is that I don't have any examples of what those excluded strings may have been, or why they're excluded. And likewise, I don't know what strings we would want to include, beyond RAZR
or X2
. I would guess that we'd need to be relatively lenient, but it's not easy to know how to distinguish them without knowing what the possibilities are (and indeed, it may be very difficult even when we do know them).
Given the above, the only real option open to me is to suggest adding RAZR
and X2
into the bracketed section, so that they are picked up specifically:
/\s((milestone|droid[2x]?(\s(razr|x2)\s)?))[a-z0-9\s]*\sbuild\//i
This will match both the required strings.
The problem, of course, is that it won't match any other possible variants that haven't been described here. Allowing for more would require knowing more about what the possible variants are, but since we've only been asked to look at these specific examples, that's all I can really offer for now.
Upvotes: 1
Reputation: 1941
What everyone else said but a simpler version..
/\s((milestone|droid[2x]?))[globa\w\s]*\sbuild\//i
Just add a \w to capture the droid suffix.
Upvotes: 0
Reputation: 574
This matches all three:
/\s(milestone|droid[x]?\s[^\s]*)[globa\s]*build\//i
It matches:
a whitespace character, then
either: 'milestone' OR 'droid' followed by 0 or 1 'x' characters, then
a whitespace character, then
zero to infinite characters that aren't white space,then
zero to infinite characters g,l,o,b,a, or whitespace then
'build' then
the '/' character
all in a case insensitive manner.
It matches the DROIDX Build/
in:
Mozilla/5.0 (Linux; U; Android 2.3.4; en-us; DROIDX Build/4.5.1_57_DX8-51) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1,182
The DROID RAZR Build/
in:
Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; DROID RAZR Build/9.8.2O-72_VZW-16) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30,652
The DROID X2 Build/
in:
Mozilla/5.0 (Linux; U; Android 2.3.5; en-us; DROID X2 Build/4.5.1A-DTN-200-18) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1,152
Upvotes: 0
Reputation: 4880
If you only need to add RAZR and X2 support: /\s((milestone|droid(?:2|x|\s+razr|\s+x2)?))[globa\s]*\sbuild\//i
Edit: Fair warning, I have no idea what the expected values can be, I just based that on the UA strings you posted in the question.
Upvotes: 0