Reputation: 335
I would like to retrieve values in string formatted like this :
public var any:int = 0;
public var anyId:Number = 2;
public var theEnd:Vector.<uint>;
public var test:Boolean = false;
public var others1:Vector.<int>;
public var firstValue:CustomType;
public var field2:Boolean = false;
public var secondValue:String = "";
public var isWorks:Boolean = false;
I want to store field name, type and value in a custom class Property :
public class Property
{
public string Name { get; set; }
public string Type { get; set; }
public string Value { get; set; }
}
And with a Regex expression get these values.
How can I do ?
Thanks
EDIT : I tried this but I don't know how to go further with vectors..etc
/public var ([a-zA-Z0-9]*):([a-zA-Z0-9]*)( = \"?([a-zA-Z0-9]*)\"?)?;/g
Upvotes: 1
Views: 1225
Reputation: 131591
You can use the following pattern:
\s*(?<vis>\w+?)\s+var\s+(?<name>\w+?)\s*:\s*(?<type>\S+?)(\s*=\s*(?<value>\S+?))?\s*;
to match each element in a line. Appending ?
after a quantifier results in a non-greedy match which makes the pattern a lot simpler - no need to negate all unwanted classes.
Values are optional, so the value group is wrapped in another, optional group (\s*=\s*(?<value>\S+?))?
Using the RegexOptions.Multiline
option means we don't have to worry about accidentally matching newlines.
The C# 6 syntax in the following example isn't required, but multiline string literals and interpolated strings make for much cleaner code.
var input= @"public var any:int = 0;
public var anyId:Number = 2;
public var theEnd:Vector.<uint>;
public var test:Boolean = false;
public var others1:Vector.<int>;
public var firstValue:CustomType;
public var field2:Boolean = false;
public var secondValue:String = """";
public var isWorks:Boolean = false;";
var pattern= @"\s*(?<vis>\w+?)\s+var\s+(?<name>\w+?)\s*:\s*(?<type>\S+?)(\s*=\s*(?<value>\S+?))?\s*;"
var regex = new Regex(pattern, RegexOptions.Multiline);
var results=regex.Matches(input);
foreach (Match m in results)
{
var g = m.Groups;
Console.WriteLine($"{g["name"],-15} {g["type"],-10} {g["value"],-10}");
}
var properties = (from m in results.OfType<Match>()
let g = m.Groups
select new Property
{
Name = g["name"].Value,
Type = g.["type"].Value,
Value = g["value"].Value
})
.ToList();
I would consider using a parser generator like ANTLR though, if I had to parse more complex input or if there are multiple patterns to match. Learning how to write the grammar takes some time, but once you learn it, it's easy to create parsers that can match input that would require very complicated regular expressions. Whitespace management also becomes a lot easier
In this case, the grammar could be something like:
property : visibility var name COLON type (EQUALS value)? SEMICOLON;
visibility : ALPHA+;
var : ALPHA ALPHA ALPHA;
name : ALPHANUM+;
type : (ALPHANUM|DOT|LEFT|RIGHT);
value : ALPHANUM
| literal;
literal : DOUBLE_QUOTE ALPHANUM* DOUBLE_QUOTE;
ALPHANUM : ALPHA
| DIGIT;
ALPHA : [A-Z][a-z];
DIGIT : [0-9];
...
WS : [\r\n\s] -> skip;
With a parser, adding eg comments would be as simple as adding comment
before SEMICOLON
in the property
rule and a new comment
rule that would match the pattern of a comment
Upvotes: 0
Reputation: 627103
Ok, posting my regex-based answer.
Your regex - /public var ([a-zA-Z0-9]*):([a-zA-Z0-9]*)( = \"?([a-zA-Z0-9]*)\"?)?;/g
- contains regex delimiters, and they are not supported in C#, and thus are treated as literal symbols. You need to remove them and the modifier g
since to obtain multiple matches in C# Regex.Matches
, or Regex.Match
with while
and Match.Success
/.NextMatch()
can be used.
The regex I am using is (?<=\s*var\s*)(?<name>[^=:\n]+):(?<type>[^;=\n]+)(?:=(?<value>[^;\n]+))?
. The newline symbols are included as negated character classes can match a newline character.
var str = "public var any:int = 0;\r\npublic var anyId:Number = 2;\r\npublic var theEnd:Vector.<uint>;\r\npublic var test:Boolean = false;\r\npublic var others1:Vector.<int>;\r\npublic var firstValue:CustomType;\r\npublic var field2:Boolean = false;\r\npublic var secondValue:String = \"\";\r\npublic var isWorks:Boolean = false;";
var rx = new Regex(@"(?<=\s*var\s*)(?<name>[^=:\n]+):(?<type>[^;=\n]+)(?:=(?<value>[^;\n]+))?");
var coll = rx.Matches(str);
var props = new List<Property>();
foreach (Match m in coll)
props.Add(new Property(m.Groups["name"].Value,m.Groups["type"].Value, m.Groups["value"].Value));
foreach (var item in props)
Console.WriteLine("Name = " + item.Name + ", Type = " + item.Type + ", Value = " + item.Value);
Or with LINQ:
var props = rx.Matches(str)
.OfType<Match>()
.Select(m =>
new Property(m.Groups["name"].Value,
m.Groups["type"].Value,
m.Groups["value"].Value))
.ToList();
And the class example:
public class Property
{
public string Name { get; set; }
public string Type { get; set; }
public string Value { get; set; }
public Property()
{}
public Property(string n, string t, string v)
{
this.Name = n;
this.Type = t;
this.Value = v;
}
}
NOTE ON PERFORMANCE:
The regex is not the quickest, but it certainly beats the one in the other answer. Here is a test performed at regexhero.net:
Upvotes: 2
Reputation: 186803
It seems, that you don't want regular expressions; in a simple case as you've provided:
String text =
@"public var any:int = 0;
public var anyId:Number = 2;
public var theEnd:Vector.<uint>;
public var test:Boolean = false;
public var others1:Vector.<int>;
public var firstValue:CustomType;
public var field2:Boolean = false;";
List<Property> result = text
.Split(new Char[] {'\r','\n'}, StringSplitOptions.RemoveEmptyEntries)
.Select(line => {
int varIndex = line.IndexOf("var") + "var".Length;
int columnIndex = line.IndexOf(":") + ":".Length;
int equalsIndex = line.IndexOf("="); // + "=".Length;
// '=' can be absent
equalsIndex = equalsIndex < 0 ? line.Length : equalsIndex + "=".Length;
return new Property() {
Name = line.Substring(varIndex, columnIndex - varIndex - 1).Trim(),
Type = line.Substring(columnIndex, columnIndex - varIndex - 1).Trim(),
Value = line.Substring(equalsIndex).Trim(' ', ';')
};
})
.ToList();
if text can contain comments and other staff, e.g.
"public (*var is commented out*) var sample: int = 123;;;; // another comment"
you have to implement a parser
Upvotes: 1