Matthew Knudsen
Matthew Knudsen

Reputation: 213

Regex to parse a user agent string

I am a total noob to regex. I have a bunch of user agent strings that I want to parse.

Windows Phone Search (Windows Phone OS 7.10;Acer;Allegro;7.10;8860)
Windows Phone Search (Windows Phone OS 7.10;HTC;7 Mozart T8698;7.10;7713)
Windows Phone Search (Windows Phone OS 7.10;HTC;Radar C110e;7.10;7720)

How can I use regex to just extract:

A) Windows Phone OS 7.10 Acer Allegro

B) Windows Phone OS 7.10 HTC 7 Mozart

C) Windows Phone OS 7.10 HTC Radar

I have tried to use Split in the following way but to no avail:

private static string parse(string input) 
{ 
    input = input.Remove(0, input.IndexOf('(') + 1).Replace(')', ' ').Trim(); 
    string[] temp = input.Split(';'); 
    if (temp[2].Contains('T'))
    { 
        temp[2] = temp[2].Substring(0, temp[2].IndexOf('T')).Trim(); 
    } 
    StringBuilder sb = new StringBuilder(); 
    sb.Append(temp[0] + " "); 
    sb.Append(temp[1] + " "); 
    sb.Append(temp[2]); 
    return sb.ToString(); 
}

Upvotes: 2

Views: 6679

Answers (2)

Bohemian
Bohemian

Reputation: 425448

This regex will capture it:

(?<=\().*?;.*?;.*?(?=;)

As code it would be:

string s = Regex.Match(input, @"(?<=\().*?;.*?;.*?(?=;)").Value

As breakdown of the regex:

  • (?<=\() = a "look behind" that asserts the previous char is a literal open bracket (
  • .*?; = a (non-greedy - won't skip ;) match of everything up to the next ;
  • (?=;) = a "look ahead" that asserts the next char is a literal semi-colon ;

Upvotes: 1

ΩmegaMan
ΩmegaMan

Reputation: 31721

I use regular expressions because it was specifically designed to parse any type of text. Once one understands the basics of the regex patterns it becomes very useful in any text situations.

In this pattern my goal is to separate each item out into named capture groups of Version, Phone, Type, Major an Minor. Once that is done by the regex processing I can use Linq to extract out the data as shown.

string @pattern = @"
(?:OS\s)                     # Match but don't capture (MDC) OS, used an an anchor
(?<Version>\d\.\d+)          # Version of OS
(?:;)                        # MDC ;
(?<Phone>[^;]+)              # Get phone name up to ;
(?:;)                        # MDC ;
(?<Type>[^;]+)               # Get phone type up to ;
(?:;)                        # MDC ;
(?<Major>\d\.\d+)            # Major version
(?:;)
(?<Minor>\d+)                # Minor Version
";

string data =
@"Windows Phone Search (Windows Phone OS 7.10;Acer;Allegro;7.10;8860)
Windows Phone Search (Windows Phone OS 7.10;HTC;7 Mozart T8698;7.10;7713)
Windows Phone Search (Windows Phone OS 7.10;HTC;Radar C110e;7.10;7720)";

 // Ignore pattern white space allows us to comment the pattern, it is not a regex processing command
var phones = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
                  .OfType<Match>()
                  .Select (mt => new
                  {
                    Name = mt.Groups["Phone"].Value.ToString(),
                    Type = mt.Groups["Type"].Value.ToString(),
                    Version = string.Format( "{0}.{1}", mt.Groups["Major"].Value.ToString(),
                                                        mt.Groups["Minor"].Value.ToString())
                  }
                  );

Console.WriteLine ("Phones Supported are:");

phones.Select(ph => string.Format("{0} of type {1} version ({2})", ph.Name, ph.Type, ph.Version))
      .ToList()
      .ForEach(Console.WriteLine);

/* Output
Phones Supported are:
Acer of type Allegro version (7.10.8860)
HTC of type 7 Mozart T8698 version (7.10.7713)
HTC of type Radar C110e version (7.10.7720)
*/

Upvotes: 1

Related Questions