Reputation: 622
Searched around a bit, but I only found cases where splitting by comma's or so would work. This case is different.
To explain my problem I'll show a tiny example:
JAN 01 00:00:01 <Admin> Action, May have spaces etc.
(This is a log entry)
I'd like to parse this string into several variables. The first bit is obviously a date, without year. Between the <>'s the login name is listed, and behind the log entry.
The configuration should have something like this:
{month} {day} {hour}:{minute}:{second} <{login}> {the_rest}
This will allow changes without having the whole thing hardcoded (using splits etc).
I think using Regex may be useful here, but I do not really know a lot about it and if it'd be usable in this case at all. Speed does not matter a lot, yet I don't really know how to achieve this.
Thanks,
~Tgys
Upvotes: 1
Views: 146
Reputation: 437336
Regular expressions are indeed the correct tool here. First, let's see how you can use a hardcoded regular expression to parse this log.
var str = "JAN 01 00:00:01 <Admin> Action, May have spaces etc.";
var re = new Regex("^" +
@"(?<month>(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC))" +
" " +
@"(?<day>\d+)" +
" " +
@"(?<hour>\d+)" +
":" +
@"(?<the_rest>.*)" +
"$");
var match = re.Match(str);
What we did here is create a regular expression piece-by-piece using named capturing groups. I didn't capture all the relevant information for brevity, and I didn't spend too much time in considering what is valid input in the context of each group (e.g. day
will match 999
, although that's not a valid day). All this can come later; for now, see it in action.
The next step is to nicely pull out the definition of each capturing group into a dictionary:
var groups = new Dictionary<string, string>
{
{ "month", "(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)" },
{ "day", @"\d+" },
{ "hour", @"\d+" },
{ "the_rest", ".*" },
};
Given this, we can now construct the same regex with
var re = new Regex("^" +
string.Format("(?<month{0}>)", groups["month"]) +
" " +
string.Format("(?<day{0}>)", groups["day"]) +
" " +
string.Format("(?<hour{0}>)", groups["hour"]) +
":" +
string.Format("(?<the_rest{0}>)", groups["the_rest"]) +
"$");
OK, this is starting to look like something that can be constructed dynamically.
Let's say we want to construct it from a specification that looks like
"{month} {day} {hour}:{the_rest}"
How to do this? With another regular expression! Specifically, we will use the overload of Regex.Replace
that enables replacement of a match with the result of a function:
var format = "{month} {day} {hour}:{the_rest}";
var result = Regex.Replace(format, @"\{(\w+)\}", m => groups[m.Groups[1].Value]);
See this in action before coming back.
At this point, we can pass in a format specification and get back a regular expression that matches the input based on this format. What's left? To translate the results of matching the regular expression to the input back to a "dynamic" structure:
var format = "{month} {day} {hour}:{the_rest}";
var re = Regex.Replace(format,
@"\{(\w+)\}",
m => string.Format("(?<{0}>{1})", m.Groups[1].Value, groups[m.Groups[1].Value]));
var regex = new Regex("^" + re + "$", RegexOptions.ExplicitCapture);
var match = regex.Match(str);
At this point:
match.Success
to see if the dynamically constructed expression matches the inputregex.GetGroupNames()
to get the names of the groups used in parsingmatch.Groups
to get the results of parsing each groupSo let's put them in a dictionary:
var results = regex.GetGroupNames().ToDictionary(n => n, n => match.Groups[n].Value);
You can now create a method Parse
that allows this:
var input = "JAN 01 00:00:01 <Admin> Action, May have spaces etc.";
var format = "{month} {day} {hour}:{the_rest}";
var results = Parse(input, format);
Parse
will recognize (but not allow the user to modify) expressions such as "{month}"
, while at the same time allowing the user to mix and match these expressions freely in order to parse the input.
Upvotes: 1
Reputation: 8952
The following regular expression will do the trick.
^([A-Z]{3})\s*([0-9]{1,2})\s*([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2})\s*<(.+)>\s*(.+)
Tested it using an online regex builder.
It will return 7 captured groups.
Upvotes: 0
Reputation: 9566
You can use this regex:
(?<Month>[A-Z]{3})\s(?<Day>[0-9]{1,2})\s(?<Hour>[0-9]{1,2}):(?<Minute>[0-9]{1,2}):(?<Second>[0-9]{1,2})\s<(?<Login>[^>]+)>(?<Rest>.*)
It's a bit clumsy and complicated but I hope the below example will get you wat you want.
class Foo
{
public string Month { get; set; }
public int Day { get; set; }
public int Hour { get; set; }
public int Minute { get; set; }
public int Second { get; set; }
public string Login { get; set; }
public string Rest { get; set; }
}
string strRegex = @"(?<Month>[A-Z]{3})\s(?<Day>[0-9]{1,2})\s(?<Hour>[0-9]{1,2}):(?<Minute>[0-9]{1,2}):(?<Second>[0-9]{1,2})\s<(?<Login>[^>]+)>(?<Rest>.*)";
RegexOptions myRegexOptions = RegexOptions.None;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = @"JAN 01 00:00:01 <Admin> Action, May have spaces etc. \n";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
new Foo
{
Month = myMatch.Groups["Month"].Value,
Day = Convert.ToInt32(myMatch.Groups["Day"].Value),
Hour = Convert.ToInt32(myMatch.Groups["Hour"].Value),
Minute = Convert.ToInt32(myMatch.Groups["Minute"].Value),
Second = Convert.ToInt32(myMatch.Groups["Second"].Value),
Login = myMatch.Groups["Login"].Value,
Rest = myMatch.Groups["Rest"].Value
}
}
}
Upvotes: 0
Reputation: 43459
You can also use this as your regular expression and use captured groups:
^(?<Month>\w{3})\s(?<Day>\d{2})\s(?<Hour>\d{2}):(?<Min>\d{2}):(?<Sec>\d{2})\s(?<User>\<(\w.+?)\>)(.+)$
Edit: missed the User part.
Upvotes: 1
Reputation: 116108
string line = "JAN 01 00:00:01 <Admin> Action, May have spaces etc.";
var m = Regex.Match(line, @"(\w{3} \d{2} \d{2}:\d{2}:\d{2}) \<(\w+)\>([\w ]+),([\w ]+)");
var date = DateTime.ParseExact(m.Groups[1].Value,"MMM dd HH:mm:ss",CultureInfo.InvariantCulture);
var user = m.Groups[2].Value;
var action = m.Groups[3].Value;
var text = m.Groups[4].Value;
Upvotes: 3
Reputation: 4519
You can use split still, splitting on the space character.
Obviously your problem is that you want to keep spaces after a certain amount of splits, so that your "the rest" stays together.
The optional int parameter of split allows you to provide the max amount of splits you wish to perform, so might provide the workaround that you are looking for.
http://msdn.microsoft.com/en-us/library/c1bs0eda.aspx
Upvotes: 1