Reputation: 196539
I have the following string:
string fullString = "group = '2843360' and (team in ('TEAM1', 'TEAM2','TEAM3'))"
And I want to parse out of this string into
string group = ParseoutGroup(fullString); // Expect "2843360"
string[] teams = ParseoutTeamNames(fullString); // Expect array with three items
In terms of the example of full string, I could have one or many teams listed (not always three as in above).
I have this partially working, but my code feels very hacky and not very future proof, so I wanted to see if there was a better regular expression solution here or a more elegant way to parse these values out of this full string? There could be other things added later to the string, so I want this to be as foolproof as possible.
Upvotes: 3
Views: 1088
Reputation: 891
An addition to @BrunoLM's solution:
(Worth the extra lines if you'll have more variables to check later on):
You can split the string on the "and" keyword and have a function to check each clause against appropriate regex statement and return the desired value.
(Untested code, but it should deliver the idea.)
statments = statment.split('and')
//So now:
//statments[0] = "group = '2843360' "
//statments[1] = "(team in ('TEAM1', 'TEAM2','TEAM3'))"
foreach s in statments {
if (s.contains('group') group = RegexFunctionToExtract_GroupValue(s) ;
if (s.contains('team') teams = RegexFunctionToExtract_TeamValue(s) ;
}
I believe that this approach will deliver cleaner, easy-to-maintain code and a slight optimization.
Of course this approach doesn't expect an "OR" clause. However, it can be done with a little more tweaking.
Upvotes: 0
Reputation: 460158
There's probably a regex solution for this, but if the format is strict I try efficient string methods first. The following works with your input.
I'm using a custom class, TeamGroup
, to encapsulate complexity and to hold all relevant properties in one object:
public class TeamGroup
{
public string Group { get; set; }
public string[] Teams { get; set; }
public static TeamGroup ParseOut(string fullString)
{
TeamGroup tg = new TeamGroup{ Teams = new string[]{ } };
int index = fullString.IndexOf("group = '");
if (index >= 0)
{
index += "group = '".Length;
int endIndex = fullString.IndexOf("'", index);
if (endIndex >= 0)
{
tg.Group = fullString.Substring(index, endIndex - index).Trim(' ', '\'');
endIndex += 1;
index = fullString.IndexOf(" and (team in (", endIndex);
if (index >= 0)
{
index += " and (team in (".Length;
endIndex = fullString.IndexOf(")", index);
if (endIndex >= 0)
{
string allTeamsString = fullString.Substring(index, endIndex - index);
tg.Teams = allTeamsString.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(t => t.Trim(' ', '\''))
.ToArray();
}
}
}
}
return tg;
}
}
You would use it in this way:
string fullString = "group = '2843360' and (team in ('TEAM1', 'TEAM2','TEAM3'))";
TeamGroup tg = TeamGroup.ParseOut(fullString);
Console.Write("Group: {0} Teams: {1}", tg.Group, string.Join(", ", tg.Teams));
Outputs:
Group: 2843360 Teams: TEAM1, TEAM2, TEAM3
Upvotes: 1
Reputation: 100351
I managed to do that using regular expressions:
var str = "group = '2843360' and (team in ('TEAM1', 'TEAM2','TEAM3'))";
// Grabs the group ID
var group = Regex.Match(str, @"group = '(?<ID>\d+)'", RegexOptions.IgnoreCase)
.Groups["ID"].Value;
// Grabs everything inside teams parentheses
var teams = Regex.Match(str, @"team in \((?<Teams>(\s*'[^']+'\s*,?)+)\)", RegexOptions.IgnoreCase)
.Groups["Teams"].Value;
// Trim and remove single quotes
var teamsArray = teams.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(s =>
{
var trimmed = s.Trim();
return trimmed.Substring(1, trimmed.Length - 2);
}).ToArray();
The result will be:
string[] { "TEAM1", "TEAM2", "TEAM3" }
Upvotes: 4
Reputation: 2315
In the simplest case regular expression might be the best answer. Unfortunately, in this case, it seems that we need to parse a subset of SQL language. While it is possible to solve this with regular expressions, they are not designed to parse complex languages (nested brackets and escaped strings).
It is also possible that the requirements will evolve over time and it will be required to parse more complex structures.
If company policy allows, I will chose to build internal DSL in order to parse this string.
One of my favorite tools to build internal DLSs is called Sprache
Below you can find an example parser using internal DSL approach.
In the code I've defined primitives to handle required SQL operators and composed final parser out of those.
[Test]
public void Test()
{
string fullString = "group = '2843360' and (team in ('TEAM1', 'TEAM2','TEAM3'))";
var resultParser =
from @group in OperatorEquals("group")
from @and in OperatorEnd()
from @team in Brackets(OperatorIn("team"))
select new {@group, @team};
var result = resultParser.Parse(fullString);
Assert.That(result.group, Is.EqualTo("2843360"));
Assert.That(result.team, Is.EquivalentTo(new[] {"TEAM1", "TEAM2", "TEAM3"}));
}
private static readonly Parser<char> CellSeparator =
from space1 in Parse.WhiteSpace.Many()
from s in Parse.Char(',')
from space2 in Parse.WhiteSpace.Many()
select s;
private static readonly Parser<char> QuoteEscape = Parse.Char('\\');
private static Parser<T> Escaped<T>(Parser<T> following)
{
return from escape in QuoteEscape
from f in following
select f;
}
private static readonly Parser<char> QuotedCellDelimiter = Parse.Char('\'');
private static readonly Parser<char> QuotedCellContent =
Parse.AnyChar.Except(QuotedCellDelimiter).Or(Escaped(QuotedCellDelimiter));
private static readonly Parser<string> QuotedCell =
from open in QuotedCellDelimiter
from content in QuotedCellContent.Many().Text()
from end in QuotedCellDelimiter
select content;
private static Parser<string> OperatorEquals(string column)
{
return
from c in Parse.String(column)
from space1 in Parse.WhiteSpace.Many()
from opEquals in Parse.Char('=')
from space2 in Parse.WhiteSpace.Many()
from content in QuotedCell
select content;
}
private static Parser<bool> OperatorEnd()
{
return
from space1 in Parse.WhiteSpace.Many()
from c in Parse.String("and")
from space2 in Parse.WhiteSpace.Many()
select true;
}
private static Parser<T> Brackets<T>(Parser<T> contentParser)
{
return from open in Parse.Char('(')
from space1 in Parse.WhiteSpace.Many()
from content in contentParser
from space2 in Parse.WhiteSpace.Many()
from close in Parse.Char(')')
select content;
}
private static Parser<IEnumerable<string>> ComaSeparated()
{
return from leading in QuotedCell
from rest in CellSeparator.Then(_ => QuotedCell).Many()
select Cons(leading, rest);
}
private static Parser<IEnumerable<string>> OperatorIn(string column)
{
return
from c in Parse.String(column)
from space1 in Parse.WhiteSpace
from opEquals in Parse.String("in")
from space2 in Parse.WhiteSpace.Many()
from content in Brackets(ComaSeparated())
from space3 in Parse.WhiteSpace.Many()
select content;
}
private static IEnumerable<T> Cons<T>(T head, IEnumerable<T> rest)
{
yield return head;
foreach (T item in rest)
yield return item;
}
Upvotes: 7
Reputation: 23374
If fullString is not machine generated, you may need to add some error catching, but this will work out of the box, and give you a test to work against.
public string ParseoutGroup(string fullString)
{
var matches = Regex.Matches(fullString, @"group\s?=\s?'([^']+)'", RegexOptions.IgnoreCase);
return matches[0].Groups[1].Captures[0].Value;
}
public string[] ParseoutTeamNames(string fullString)
{
var teams = new List<string>();
var matches = Regex.Matches(fullString, @"team\s?in\s?\((\s*'([^']+)',?\s*)+\)", RegexOptions.IgnoreCase);
foreach (var capture in matches[0].Groups[2].Captures)
{
teams.Add(capture.ToString());
}
return teams.ToArray();
}
[Test]
public void parser()
{
string test = "group = '2843360' and (team in ('team1', 'team2', 'team3'))";
var group = ParseoutGroup(test);
Assert.AreEqual("2843360",group);
var teams = ParseoutTeamNames(test);
Assert.AreEqual(3, teams.Count());
Assert.AreEqual("team1", teams[0]);
Assert.AreEqual("team2", teams[1]);
Assert.AreEqual("team3", teams[2]);
}
Upvotes: 0
Reputation: 904
I think you will need to look into a tokenization process in order to get the desired result and take into consideration the order of execution established by the parenthesis. You may be able to use the shunting-yard algorithm to assist with tokenization and execution order.
The advantage of the shunting-yard is that it allows you to define tokens that can be later used to property parse the string and execute the proper operation. While it normally applies to mathematical order of operations it can be adapted to fit your purpose.
Here is some information:
http://en.wikipedia.org/wiki/Shunting-yard_algorithm http://www.slideshare.net/grahamwell/shunting-yard
Upvotes: 0