Sergio Romero
Sergio Romero

Reputation: 6607

Recommendations in how to parse a string with tabular data

Consider the following string:

============================================================================================================================================================
grp-membership table
============================================================================================================================================================
mcast-grp-addr |vlan-id|mcast-src-addr |port                                      |state
---------------+-------+---------------+------------------------------------------+-------------------------------------------------------------------------
111.50.0.10     4000    0.0.0.0         1/1/4/20                                   full-view
111.60.1.0      4000    0.0.0.0         1/1/4/20                                   full-view
111.60.3.52     4000    0.0.0.0         1/1/4/20                                   full-view
111.60.4.80     4000    0.0.0.0         1/1/4/20                                   full-view
111.60.6.60     4000    0.0.0.0         1/1/4/20                                   full-view
------------------------------------------------------------------------------------------------------------------------------------------------------------
grp-membership count : 5
============================================================================================================================================================

If the source of this data were a file it would be simple to parse it since I would do it line by line but, unfortunately is a string that, apparently does not have any \n or \r to be able to know when a line ends.

With my limited knowledge of regular expressions I am able to get the table name, the column names and the count at the bottom but I have no idea how to get the data and to group each record and put each field in the correct column.

What I would like is to have something like the following:

public class GroupMembership  
{  
   public string McastGrpAddr {get; set;}  
   public int VlanId {get;set;}  
   public string McastSrcAddr {get;set;}  
   public string Port {get;set;}  
   public string State {get;set;}  
}  

var whatever = new List<GroupMembership>();

Or something like that.

I will be parsing a few different strings with similar structures so I would rather not have to hard code anything.

What would be the simplest way to accomplish this? Are regular expressions a good approach, or is there a better way to do it?

Thank you.

Upvotes: 0

Views: 89

Answers (1)

Alexander Petrov
Alexander Petrov

Reputation: 14231

Try this:

string text = "your string here";

string pattern = @"
(?<grp> \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} )  # pattern for mcast-grp-addr
\s+
(?<id> \d+ )                                  # pattern for vlan-id
\s+
(?<src> \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} )  # pattern for mcast-src-addr
\s+
(?<port> \d{1,3}/\d{1,3}/\d{1,3}/\d{1,3} )    # pattern for port 
\s+
(?<state> .+? )                               # pattern for state
(?= \d | -- | \r\n )                          # lookahead for digit or -- or newline
";

var matches = Regex.Matches(text, pattern, RegexOptions.IgnorePatternWhitespace);
var list = new List<GroupMembership>();

foreach (Match match in matches)
{
    var membership = new GroupMembership();

    membership.McastGrpAddr = match.Groups["grp"].Value;
    membership.VlanId = int.Parse(match.Groups["id"].Value);
    membership.McastSrcAddr = match.Groups["src"].Value;
    membership.Port = match.Groups["port"].Value;
    membership.State = match.Groups["state"].Value;

    list.Add(membership);
}

Note for lookahead pattern. It depends on the symbols between full-view and IP digits.

Upvotes: 2

Related Questions