Reputation: 7092
I have a large log file that looks like the 3 row example below.
\LogFiles\W3SVC1\u_ex12.log:32:2015-01-04 07:11:22 &actor=%7B%22name%22%3A%5B%22Smith%2C%20Steve%22%5D%2C%22mbox%22%3A%5B%22mailto%3ASmith.Steve%40xyz.com%22%5D%7D&
\LogFiles\W3SVC1\u_ex12.log:32:2015-06-08 02:04:13 &actor=%7B%22name%22%3A%5B%22Brown%2C%20Bob%22%5D%2C%22mbox%22%3A%5B%22mailto%3ABrown.Bob%40xyz.com%22%5D%7D&
\LogFiles\W3SVC1\u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Franklin%2C%20Francis%22%5D%2C%22mbox%22%3A%5B%22mailto%3AFranklin.Francis%40xyz.com%22%5D%7D&
I need to pull the date, name, and mailto fields that are buried within the log file.
I tried using an online regex generator but only got this far before it seemed to become unwieldy.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
//test string
string txt="\LogFiles\W3SVC1\u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Franklin%2C%20Francis%22%5D%2C%22mbox%22%3A%5B%22mailto%3AFranklin.Francis%40xyz.com%22%5D%7D&";
string re1=".*?"; // Non-greedy match on filler
string re2="((?:(?:[1]{1}\\d{1}\\d{1}\\d{1})|(?:[2]{1}\\d{3}))[-:\\/.](?:[0]?[1-9]|[1][012])[-:\\/.](?:(?:[0-2]?\\d{1})|(?:[3][01]{1})))(?![\\d])"; // YYYYMMDD 1
Regex r = new Regex(re1+re2,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String yyyymmdd1=m.Groups[1].ToString();
Console.Write("("+yyyymmdd1.ToString()+")"+"\n");
}
Console.ReadLine();
}
}
}
Is there a way to do this in c# with or without regex?
Thanks!
Upvotes: 1
Views: 263
Reputation:
Assuming you use a regex and it's in this generalized line form, something like this should work -
(?m)^\S+:(?<Date>\d+-\d+-\d+)\s(?:(?!&actor=).)+&actor=(?:%[0-9a-fA-F]{2})*name(?:%[0-9a-fA-F]{2})*(?<LastName>(?:(?!%[0-9a-fA-F]{2}|mbox).)+)(?:%[0-9a-fA-F]{2})+(?<FirstName>(?:(?!%[0-9a-fA-F]{2}|mbox).)*)(?:%[0-9a-fA-F]{2})*mbox(?:%[0-9a-fA-F]{2})+mailto(?:%[0-9a-fA-F]{2})+(?<MailUser>(?:(?!%[0-9a-fA-F]{2}).)+)(?:%[0-9a-fA-F]{2})+(?<MailDomain>(?:(?!%[0-9a-fA-F]{2}).)+)(?:%[0-9a-fA-F]{2})+&
It uses the multi-line modifier (?m)
from a modifier group in the regex.
Formatted:
(?m)
^
\S+
:
(?<Date> #_(1 start)
\d+
-
\d+
-
\d+
) #_(1 end)
\s
(?:
(?! &actor= )
.
)+
&actor=
(?: % [0-9a-fA-F]{2} )*
name
(?: % [0-9a-fA-F]{2} )*
(?<LastName> #_(2 start)
(?:
(?! % [0-9a-fA-F]{2} | mbox )
.
)+
) #_(2 end)
(?: % [0-9a-fA-F]{2} )+
(?<FirstName> #_(3 start)
(?:
(?! % [0-9a-fA-F]{2} | mbox )
.
)*
) #_(3 end)
(?: % [0-9a-fA-F]{2} )*
mbox
(?: % [0-9a-fA-F]{2} )+
mailto
(?: % [0-9a-fA-F]{2} )+
(?<MailUser> #_(4 start)
(?:
(?! % [0-9a-fA-F]{2} )
.
)+
) #_(4 end)
(?: % [0-9a-fA-F]{2} )+
(?<MailDomain> #_(5 start)
(?:
(?! % [0-9a-fA-F]{2} )
.
)+
) #_(5 end)
(?: % [0-9a-fA-F]{2} )+
&
Output:
** Grp 1 [Date] - ( pos 31 , len 10 )
2015-01-04
** Grp 2 [LastName] - ( pos 80 , len 5 )
Smith
** Grp 3 [FirstName] - ( pos 91 , len 5 )
Steve
** Grp 4 [MailUser] - ( pos 133 , len 11 )
Smith.Steve
** Grp 5 [MailDomain] - ( pos 147 , len 7 )
xyz.com
---------------------
** Grp 1 [Date] - ( pos 197 , len 10 )
2015-06-08
** Grp 2 [LastName] - ( pos 246 , len 5 )
Brown
** Grp 3 [FirstName] - ( pos 257 , len 3 )
Bob
** Grp 4 [MailUser] - ( pos 297 , len 9 )
Brown.Bob
** Grp 5 [MailDomain] - ( pos 309 , len 7 )
xyz.com
----------------------
** Grp 1 [Date] - ( pos 359 , len 10 )
2014-08-02
** Grp 2 [LastName] - ( pos 408 , len 8 )
Franklin
** Grp 3 [FirstName] - ( pos 422 , len 7 )
Francis
** Grp 4 [MailUser] - ( pos 466 , len 16 )
Franklin.Francis
** Grp 5 [MailDomain] - ( pos 485 , len 7 )
xyz.com
Also, with a slight modification, you can get them all into a CaptureCollection list
in a single match.
C#
string log =
@"
\LogFiles\W3SVC1\u_ex12.log:32:2015-01-04 07:11:22 &actor=%7B%22name%22%3A%5B%22Smith%2C%20Steve%22%5D%2C%22mbox%22%3A%5B%22mailto%3ASmith.Steve%40xyz.com%22%5D%7D&
\LogFiles\W3SVC1\u_ex12.log:32:2015-06-08 02:04:13 &actor=%7B%22name%22%3A%5B%22Brown%2C%20Bob%22%5D%2C%22mbox%22%3A%5B%22mailto%3ABrown.Bob%40xyz.com%22%5D%7D&
\LogFiles\W3SVC1\u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Franklin%2C%20Francis%22%5D%2C%22mbox%22%3A%5B%22mailto%3AFranklin.Francis%40xyz.com%22%5D%7D&
sfgbadfbdfbadfbdab
junk .........
\LogFiles\W3SVC1\u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Smith%2C%20Joe%22%5D%2C%22mbox%22%3A%5B%22mailto%3ASmith.Joe%40xyz.com%22%5D%7D&
\LogFiles\W3SVC1\u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Doe%2C%20Jane%22%5D%2C%22mbox%22%3A%5B%22mailto%3ADoe.Jane%40xyz.com%22%5D%7D&
";
Regex RxLog = new Regex(@"(?m)(?:^\S+:(?<Date>\d+-\d+-\d+)\s(?:(?!&actor=).)+&actor=(?:%[0-9a-fA-F]{2})*name(?:%[0-9a-fA-F]{2})*(?<LastName>(?:(?!%[0-9a-fA-F]{2}|mbox).)+)(?:%[0-9a-fA-F]{2})+(?<FirstName>(?:(?!%[0-9a-fA-F]{2}|mbox).)*)(?:%[0-9a-fA-F]{2})*mbox(?:%[0-9a-fA-F]{2})+mailto(?:%[0-9a-fA-F]{2})+(?<MailUser>(?:(?!%[0-9a-fA-F]{2}).)+)(?:%[0-9a-fA-F]{2})+(?<MailDomain>(?:(?!%[0-9a-fA-F]{2}).)+)(?:%[0-9a-fA-F]{2})+&\s*|(?:.*\s))+");
Match logMatch = RxLog.Match(log);
if (logMatch.Success)
{
CaptureCollection ccDate = logMatch.Groups["Date"].Captures;
CaptureCollection ccLname = logMatch.Groups["LastName"].Captures;
CaptureCollection ccFname = logMatch.Groups["FirstName"].Captures;
CaptureCollection ccUser = logMatch.Groups["MailUser"].Captures;
CaptureCollection ccDomain = logMatch.Groups["MailDomain"].Captures;
for (int i = 0; i < ccDate.Count; i++)
Console.WriteLine("{0} {1}, {2} {3}@{4}", ccDate[i].Value, ccLname[i].Value, ccFname[i].Value, ccUser[i].Value, ccDomain[i].Value );
}
Output:
2015-01-04 Smith, Steve [email protected]
2015-06-08 Brown, Bob [email protected]
2014-08-02 Franklin, Francis [email protected]
2014-08-02 Smith, Joe [email protected]
2014-08-02 Doe, Jane [email protected]
Upvotes: 2
Reputation: 2934
What you can do is to split the line into several parts, then decode the url part, get the actor param, deserialize it into an Actor
and the use it's properties. An quick example would be:
string txt = @"\LogFiles\W3SVC1\u_ex12.log:32:2014-08-02 05:50:37 &actor=%7B%22name%22%3A%5B%22Franklin%2C%20Francis%22%5D%2C%22mbox%22%3A%5B%22mailto%3AFranklin.Francis%40xyz.com%22%5D%7D&";
var parts = txt.Split(' ');
var urlParams = HttpUtility.UrlDecode(parts[2]);
string actorJson = HttpUtility.ParseQueryString(urlParams).Get("actor");
Actor actor = JsonConvert.DeserializeObject<Actor>(actorJson);
Console.WriteLine(actor.Name + " " + actor.EmailAddress);
You would need to add a reference to System.Web
and Json.Net
for it to work and of course a definition for your Actor class like:
namespace MyNamespace
{
public class Actor
{
public string[] name { get; set; }
public string[] mbox { get; set; }
public string Name { get { return name[0]; } }
public string EmailAddress { get { return mbox[0].Replace("mailto:", ""); } }
}
}
Now you just get all the lines with the File class and loop through each of them an put all deserilized actors into a List or something similar.
Upvotes: 1