Gullu
Gullu

Reputation: 3529

c# one to many string parsing help using regex lambda

I know how to do this using the obvious way with string.split().

I am looking for a more elegant and faster performing code possibly using regex and/or linq/lambdas.

If my input string is like this "GradeId1:StudentName1*StudentName2*...StudentNameN,GradeId2:StudentName1*StudentName2....StudentNameN,....GradeN:Student1*StudentName2*...StudentNameN"

Gist: One contiguous string of Grades and students. GradeId is int , student name is string. Grades separated by comma and student name separated by star.

It is possible there are no students in a particular grade as in "1:stud1*stud2*stud3,2,3" Here grades 2 and 3 have no students. Only grade 1 has 3 students. My objective is to get a collection where I could

foreach(Grade g in mycollection)
{ 
  foreach (int i = 0; i < g.studentnames.length; i++)
     console.writeline( g.StudentNames[i] ) 
}

class Grade { int gradeid, string[] studentnames } 

regex and linq Gurus, please advise. thank you

Upvotes: 2

Views: 1313

Answers (6)

agent-j
agent-j

Reputation: 27943

Edit: Now using the OP's new input string.

string mystring = "GradeId1:StudentName1*StudentName2*StudentNameN,GradeId2:StudentName1*StudentName2*StudentNameN,GradeIdN:Student1*StudentName2*StudentNameN";
MatchCollection matches = 
   Regex.Matches(
      mystring,
      @"(?:GradeId(\w+)(?:(?=,)|\:(?:([\w ]+)(?:$|\*))*))");

var grades = matches.Cast<Match>().Select(
   gradeMatch => 
      new
      {
         Grade = gradeMatch.Groups[1].Value,
         Students = gradeMatch.Groups[2].Captures
            .Cast<Capture> ()
            .Select (c => c.Value).ToList ()
      });

foreach (var grade in grades)
{
   Console.WriteLine("Grade: " + grade.Grade);
   foreach (string student in grade.Students)
      Console.WriteLine("   " + student);
}

For this string, GradeId1:StudentName1*StudentName2*StudentNameN,GradeId2,GradeIdN:Student1*StudentName2*StudentNameN produces this output:

Grade: 1
   StudentName1
   StudentName2
Grade: 2
Grade: N
   Student1
   StudentName2
   StudentNameN

For the interested:

match[0].Value => GradeId1:StudentName1*StudentName2*
match[0].Groups[0].Value => GradeId1:StudentName1*StudentName2*
match[0].Groups[0].Captures[0].Value => GradeId1:StudentName1*StudentName2*
match[0].Groups[1].Value => 1
match[0].Groups[1].Captures[0].Value => 1
match[0].Groups[2].Value => StudentName2
match[0].Groups[2].Captures[0].Value => StudentName1
match[0].Groups[2].Captures[1].Value => StudentName2
match[1].Value => GradeId2
match[1].Groups[0].Value => GradeId2
match[1].Groups[0].Captures[0].Value => GradeId2
match[1].Groups[1].Value => 2
match[1].Groups[1].Captures[0].Value => 2
match[1].Groups[2].Value =>
match[2].Value => GradeIdN:Student1*StudentName2*StudentNameN
match[2].Groups[0].Value => GradeIdN:Student1*StudentName2*StudentNameN
match[2].Groups[0].Captures[0].Value => GradeIdN:Student1*StudentName2*StudentNameN
match[2].Groups[1].Value => N
match[2].Groups[1].Captures[0].Value => N
match[2].Groups[2].Value => StudentNameN
match[2].Groups[2].Captures[0].Value => Student1
match[2].Groups[2].Captures[1].Value => StudentName2
match[2].Groups[2].Captures[2].Value => StudentNameN

Upvotes: 2

Brandon Moretz
Brandon Moretz

Reputation: 7641

Assuming your data format is the way you described it, this maybe what you're looking for? You still should probably use the String.Split() to work with the input as it's a string delimited list, but you can at least make it an anonymous typed collection.

string input = "10:name1*20:name2*30:name3*40:name4*50:name5";

var data =
(
    from pair in input.Split( '*' )
    let student = pair.Split( ':' )
    select new { Grade = int.Parse( student[ 0 ] ), Name = student[ 1 ] }
);

foreach( var student in data )
{
    Console.WriteLine( student );
}

Edit:

It seems you have a 1:many grade -> student relationship? Maybe you should look into using a Lookup collection to get all the students with N grade easily.

string input = "10:name1.1*name1.2*name1.3,20:name2.1*name2.2*,30:name3.1,40:name4.1*name4.2*name4.3,50:name5.1";

var studentData = ( Lookup<int,string[]> )
(
    from 
        line in input.Split( ',' )
    where 
        line.IndexOf( ':' ) > -1
    let 
        grade = line.Substring( 0, line.IndexOf( ':' ) )
    let 
        names = line.Remove( 0, grade.Length + 1 ).Split( '*' )
    select 
        new { Grade = int.Parse( grade ), Students = names }
).ToLookup( s => s.Grade, s => s.Students );

foreach( IGrouping<int,string[]> gradeSet in studentData )
{
    Console.WriteLine( gradeSet.Key );
    Console.WriteLine( studentData[ gradeSet.Key ] );
}

Also, I realize this isn't the "linqy-est" solution, but hopefully it'll make your job easier.

Upvotes: 3

daveaglick
daveaglick

Reputation: 3688

Here's an answer using one (long) line of Linq (I prefer to use the extension methods directly, but you could use the short Linq syntax too). I'm not sure using Linq/extensions is more "elegant" or any simpler than doing it long-hand with nested ifs and the like. I will admit, there's something cool about a nice long Linq expression that gets a complex job done.

string input = "1:A*B*C,2:A*B,3:B*C*D";
var grades = input
  .Split(',')
  .Select(x => x.Split(':'))
  .Select(x => x[1].Split('*').Select(n => new { GradeId = x[0], StudentName = n }))
  .SelectMany(x => x)
  .ToList();

This produes a List<T> of anonymous types with GradeId and StudentName fields for all combinations.

Edit: The revised question is a little easier. Here's how you could get the nested lists as requested using this technique:

var grades = input
  .Split(',')
  .Select(x => x.Split(':'))
  .Select(x => new { GradeId = x[0], StudentNames = x[1].Split('*').ToList() })
  .ToList();

You can then iterate like so:

foreach(var grade in grades)
{
  //You could always use a foreach here too
  for(int i = 0; i < grade.StudentNames.Length ; i++)
  {
    Console.WriteLine(grade.StudentNames[i]);
  }
}

Upvotes: 1

Sean U
Sean U

Reputation: 6850

In my experience, String.Split() tends to be the best option in most cases where it's workable. The one exception is when you're dealing with very large blocks of text that can't be read one line at a time (or similar) so that that attacking it with Split() will end up cramming the heap full of large string arrays.

In those cases you could produce a composition of enumerator blocks. The inside of them might be a loop that uses String.IndexOf() to find successive delimiters, and then uses Substring() to yank out and yield the text between them. It helps to limit the number of strings that are on the heap at any one time, but stops short of treating a string as IEnumerable (which doesn't tend to perform as well in my experience).

For that matter, it might be fine to just use one block like that, and revert back to using String.Split() for handling its results.

Upvotes: 0

Greggo
Greggo

Reputation: 181

OpticalDelusion is right that Linq will definitely HURT performance. In general, Linq is convenient, but not fast.

Regex isn't useful for the actual parsing in complex string splitting cases like this - it's more useful for finding a particlar pattern in an arbitrary string or whitelisting the string. So if you wanted to make sure that the input string is in the correct format, you could use a regex pattern like this:

"^([a-zA-Z0-9]+:[a-zA-Z0-9]+(\*[a-zA-Z0-9])*)(,[a-zA-Z0-9]+:[a-zA-Z0-9]+(\*[a-zA-Z0-9])?)*$"

Basically, any character or number, one or more times, followed by a colon, then another sequence of letters or numbers and then a '*' and another sequence of letters or numbers 0 or more times. This is then repeated 0 or more times.

Once you've ensured the string is in the proper format, you can do string.split() operations.

Upvotes: 2

Alec
Alec

Reputation: 1706

You can do Linq and lambda stuff like this but I don't think you will see a positive performance difference and it will be more code than if you just parsed it normally.

var grades = (from s in text select s).TakeWhile(a => !a.Equals(','));

Sorry I am not about to do the whole thing for you unless you really need help and really want to do it this way.

Upvotes: 0

Related Questions