Reputation: 3529
I know how to do this using the obvious way with string.split().
I am looking for a more elegant and faster performing code possibly using regex and/or linq/lambdas.
If my input string is like this "GradeId1:StudentName1*StudentName2*...StudentNameN,GradeId2:StudentName1*StudentName2....StudentNameN,....GradeN:Student1*StudentName2*...StudentNameN"
Gist: One contiguous string of Grades and students. GradeId is int , student name is string. Grades separated by comma and student name separated by star.
It is possible there are no students in a particular grade as in "1:stud1*stud2*stud3,2,3" Here grades 2 and 3 have no students. Only grade 1 has 3 students. My objective is to get a collection where I could
foreach(Grade g in mycollection)
{
foreach (int i = 0; i < g.studentnames.length; i++)
console.writeline( g.StudentNames[i] )
}
class Grade { int gradeid, string[] studentnames }
regex and linq Gurus, please advise. thank you
Upvotes: 2
Views: 1313
Reputation: 27943
Edit: Now using the OP's new input string.
string mystring = "GradeId1:StudentName1*StudentName2*StudentNameN,GradeId2:StudentName1*StudentName2*StudentNameN,GradeIdN:Student1*StudentName2*StudentNameN";
MatchCollection matches =
Regex.Matches(
mystring,
@"(?:GradeId(\w+)(?:(?=,)|\:(?:([\w ]+)(?:$|\*))*))");
var grades = matches.Cast<Match>().Select(
gradeMatch =>
new
{
Grade = gradeMatch.Groups[1].Value,
Students = gradeMatch.Groups[2].Captures
.Cast<Capture> ()
.Select (c => c.Value).ToList ()
});
foreach (var grade in grades)
{
Console.WriteLine("Grade: " + grade.Grade);
foreach (string student in grade.Students)
Console.WriteLine(" " + student);
}
For this string, GradeId1:StudentName1*StudentName2*StudentNameN,GradeId2,GradeIdN:Student1*StudentName2*StudentNameN
produces this output:
Grade: 1
StudentName1
StudentName2
Grade: 2
Grade: N
Student1
StudentName2
StudentNameN
For the interested:
match[0].Value => GradeId1:StudentName1*StudentName2*
match[0].Groups[0].Value => GradeId1:StudentName1*StudentName2*
match[0].Groups[0].Captures[0].Value => GradeId1:StudentName1*StudentName2*
match[0].Groups[1].Value => 1
match[0].Groups[1].Captures[0].Value => 1
match[0].Groups[2].Value => StudentName2
match[0].Groups[2].Captures[0].Value => StudentName1
match[0].Groups[2].Captures[1].Value => StudentName2
match[1].Value => GradeId2
match[1].Groups[0].Value => GradeId2
match[1].Groups[0].Captures[0].Value => GradeId2
match[1].Groups[1].Value => 2
match[1].Groups[1].Captures[0].Value => 2
match[1].Groups[2].Value =>
match[2].Value => GradeIdN:Student1*StudentName2*StudentNameN
match[2].Groups[0].Value => GradeIdN:Student1*StudentName2*StudentNameN
match[2].Groups[0].Captures[0].Value => GradeIdN:Student1*StudentName2*StudentNameN
match[2].Groups[1].Value => N
match[2].Groups[1].Captures[0].Value => N
match[2].Groups[2].Value => StudentNameN
match[2].Groups[2].Captures[0].Value => Student1
match[2].Groups[2].Captures[1].Value => StudentName2
match[2].Groups[2].Captures[2].Value => StudentNameN
Upvotes: 2
Reputation: 7641
Assuming your data format is the way you described it, this maybe what you're looking for? You still should probably use the String.Split() to work with the input as it's a string delimited list, but you can at least make it an anonymous typed collection.
string input = "10:name1*20:name2*30:name3*40:name4*50:name5";
var data =
(
from pair in input.Split( '*' )
let student = pair.Split( ':' )
select new { Grade = int.Parse( student[ 0 ] ), Name = student[ 1 ] }
);
foreach( var student in data )
{
Console.WriteLine( student );
}
Edit:
It seems you have a 1:many grade -> student relationship? Maybe you should look into using a Lookup collection to get all the students with N grade easily.
string input = "10:name1.1*name1.2*name1.3,20:name2.1*name2.2*,30:name3.1,40:name4.1*name4.2*name4.3,50:name5.1";
var studentData = ( Lookup<int,string[]> )
(
from
line in input.Split( ',' )
where
line.IndexOf( ':' ) > -1
let
grade = line.Substring( 0, line.IndexOf( ':' ) )
let
names = line.Remove( 0, grade.Length + 1 ).Split( '*' )
select
new { Grade = int.Parse( grade ), Students = names }
).ToLookup( s => s.Grade, s => s.Students );
foreach( IGrouping<int,string[]> gradeSet in studentData )
{
Console.WriteLine( gradeSet.Key );
Console.WriteLine( studentData[ gradeSet.Key ] );
}
Also, I realize this isn't the "linqy-est" solution, but hopefully it'll make your job easier.
Upvotes: 3
Reputation: 3688
Here's an answer using one (long) line of Linq (I prefer to use the extension methods directly, but you could use the short Linq syntax too). I'm not sure using Linq/extensions is more "elegant" or any simpler than doing it long-hand with nested ifs and the like. I will admit, there's something cool about a nice long Linq expression that gets a complex job done.
string input = "1:A*B*C,2:A*B,3:B*C*D";
var grades = input
.Split(',')
.Select(x => x.Split(':'))
.Select(x => x[1].Split('*').Select(n => new { GradeId = x[0], StudentName = n }))
.SelectMany(x => x)
.ToList();
This produes a List<T>
of anonymous types with GradeId and StudentName fields for all combinations.
Edit: The revised question is a little easier. Here's how you could get the nested lists as requested using this technique:
var grades = input
.Split(',')
.Select(x => x.Split(':'))
.Select(x => new { GradeId = x[0], StudentNames = x[1].Split('*').ToList() })
.ToList();
You can then iterate like so:
foreach(var grade in grades)
{
//You could always use a foreach here too
for(int i = 0; i < grade.StudentNames.Length ; i++)
{
Console.WriteLine(grade.StudentNames[i]);
}
}
Upvotes: 1
Reputation: 6850
In my experience, String.Split() tends to be the best option in most cases where it's workable. The one exception is when you're dealing with very large blocks of text that can't be read one line at a time (or similar) so that that attacking it with Split() will end up cramming the heap full of large string arrays.
In those cases you could produce a composition of enumerator blocks. The inside of them might be a loop that uses String.IndexOf() to find successive delimiters, and then uses Substring() to yank out and yield the text between them. It helps to limit the number of strings that are on the heap at any one time, but stops short of treating a string as IEnumerable (which doesn't tend to perform as well in my experience).
For that matter, it might be fine to just use one block like that, and revert back to using String.Split() for handling its results.
Upvotes: 0
Reputation: 181
OpticalDelusion is right that Linq will definitely HURT performance. In general, Linq is convenient, but not fast.
Regex isn't useful for the actual parsing in complex string splitting cases like this - it's more useful for finding a particlar pattern in an arbitrary string or whitelisting the string. So if you wanted to make sure that the input string is in the correct format, you could use a regex pattern like this:
"^([a-zA-Z0-9]+:[a-zA-Z0-9]+(\*[a-zA-Z0-9])*)(,[a-zA-Z0-9]+:[a-zA-Z0-9]+(\*[a-zA-Z0-9])?)*$"
Basically, any character or number, one or more times, followed by a colon, then another sequence of letters or numbers and then a '*' and another sequence of letters or numbers 0 or more times. This is then repeated 0 or more times.
Once you've ensured the string is in the proper format, you can do string.split() operations.
Upvotes: 2
Reputation: 1706
You can do Linq and lambda stuff like this but I don't think you will see a positive performance difference and it will be more code than if you just parsed it normally.
var grades = (from s in text select s).TakeWhile(a => !a.Equals(','));
Sorry I am not about to do the whole thing for you unless you really need help and really want to do it this way.
Upvotes: 0