Reputation: 165
I have this
var regex = new Regex(@"StartDate:(.*)EndDate:(.*)W.*Status:(.*)");
So this gets me values until it hits a W in the string correct? - I need it to stop at a W OR S. I have tried a few different ways but I am not getting it to work. Anyone got some info?
More info:
record = record.Replace(" ", "").Replace("\r\n", "").Replace("-", "/");
var regex = new Regex(@"StartDate:(.*)EndDate:(.*)W.*Status:(.*)");
string strStartDate = regex.Match(record).Groups[1].ToString();
string strEndDate = regex.Match(record).Groups[2].ToString();
string Status = regex.Match(record).Groups[3].ToString().ToUpper().StartsWith("In") ? "Inactive" : "Active";
I am trying to parse a big string of values, I only want 3 things - Start Date, End Date, and Status (active/inactive). However there are 3 different values for each (3 start dates, 3 end dates, 3 status')
First 2 string go like this
"Start Date:
2014-09-08
End Date:
2017-09-07
Warranty Type:
XXX
Status:
Active
Serial Number/IMEI:
XXXXXXXXXXX
Description:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
The 3rd string is like this
"Start Date:
2014-09-08
End Date:
2017-09-07
Status:
Active
Warranty Upgrade Code:
SVC_PRIORITY"
On the last string it will not display the dates because of the W.*
after end date im guessing
I am not getting the 2 dates on the last string
Upvotes: 1
Views: 125
Reputation: 14079
No need to replace the new lines in your example
List<string> resultList = new List<string>();
var subjectString = @"Start Date: xxxxx
End Date: yyyy
Warranty Type: zzzz
Status: uuuu
Start Date: aaaa
End Date: bbbb
Status: cccc";
Regex regexObj = new Regex(@"Start Date: (.*?)\nEnd Date: (.*?)\n(.|\n)*?Status: (.*)");
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Groups[1].Value);
resultList.Add(matchResult.Groups[2].Value);
resultList.Add(matchResult.Groups[4].Value);
matchResult = matchResult.NextMatch();
}
Upvotes: 1
Reputation: 31616
Avoid .*
its a catch all which gets regex pattern creators in trouble. Instead create the pattern to match to a specific pattern in the data which always occurs in the data.
Your pattern are the two dates of \d\d\d\d-\d\d-\d\d\d\d
the rest is anchor text, which should be used as static anchors which can be skipped.
Here is an example where it looks for the date patterns. Once found regex puts it into named match capture groups (?<GroupNameHere>...)
and Linq extracts each match into a dynamic entity and parses the date times.
Data
Note the first date is reversed as per your example
var data = @"Start Date:
2014-09-08
End Date:
2017-09-07
Status:
Active
Start Date:
2014-09-09
End Date:
2017-09-10
Status:
In-Active
";
Pattern
string pattern = @"
^Start\sDate:\s+ # An anchor of start date that always starts at the BOL
(?<Start>\d\d\d\d-\d\d-\d\d) # actual start date pattern
\s+ # a lot of space including \r\n
^End\sDate:\s+ # End date anchor and space
(?<End>\d\d\d\d-\d\d-\d\d) # pattern of the end date.
\s+ # Same pattern as above for Status
^Status:\s+
(?<Status>[^\s]+)
";
Processing
// Explicit hints to the parser to ingore any non specified matches ones outside the parenthesis(..)
// Multiline states ^ and $ are beginning and eol lines and not beginning and end of buffer.
// Ignore allows us to comment the pattern only; does not affect processing.
Regex.Matches(data, pattern, RegexOptions.ExplicitCapture |
RegexOptions.Multiline |
RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Status = mt.Groups["Status"].Value,
StartDate = DateTime.Parse(mt.Groups["Start"].Value),
EndDate = DateTime.Parse(mt.Groups["End"].Value)
})
Result
Upvotes: 0
Reputation: 4504
EDIT Please try the function to parse using regex:
using System.Text.RegularExpressions;
using System.Linq;
using System.Windows.Forms;
private static List<string[]> parseString(string input)
{
var pattern = @"Start\s+Date:\s+([0-9-]+)\s+End\s+Date:\s+([0-9-]+)\s+(?:Warranty\s+Type:\s+\w+\s+)?Status:\s+(\w+)\s*";
return Regex.Matches(input, pattern).Cast<Match>().ToList().ConvertAll(m => new string[] { m.Groups[1].Value, m.Groups[2].Value, m.Groups[3].Value });
}
// To show the result string
var result1 = parseString(str1);
string result_string = string.Join("\n", result1.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
MessageBox.Show(result_string);
Output:
EDIT2 For OP's situation, you could call the function from inside the foreach loop like this:
foreach (HtmlElement el in webBrowser1.Document.GetElementsByTagName("div"))
{
if (el.GetAttribute("className") == "fluid-row Borderfluid")
{
string record = el.InnerText;
//if record is the string to parse
var result = parseString(record);
var result_string = string.Join("\n", result.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
MessageBox.Show(result_string);
}
}
Upvotes: 1
Reputation: 626826
You may replace your code with the following one (see IDEONE demo):
var s = @"Start Date: xxxxx
End Date: xxxx
Warranty Type: xxxx
Status: xxxx";
var res = Regex.Replace(s, @":\s+", ": ") // Remove excessive whitespace
.Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) // Split each line with `:`+space
.ToDictionary(n => n[0], n => n[1]); // Create a dictionary
string strStartDate = string.Empty;
string strEndDate = string.Empty;
string Status = string.Empty;
string Warranty = string.Empty;
// Demo & variable assignment
if (res.ContainsKey("Start Date")) {
Console.WriteLine(res["Start Date"]);
strStartDate = res["Start Date"];
}
if (res.ContainsKey("Warranty Type")) {
Console.WriteLine(res["Warranty Type"]);
Warranty = res["Warranty Type"];
}
if (res.ContainsKey("End Date")) {
Console.WriteLine(res["End Date"]);
strEndDate = res["End Date"];
}
if (res.ContainsKey("Status")) {
Console.WriteLine(res["Status"]);
string Status = res["Status"];
}
Note that the best approach is to declare your own class with the fields like WarrantyType
, StartDate
, etc. and initialize that right in the LINQ code.
Upvotes: 0