Reputation: 1684
So I created this regex to parse strings like this (I need the values for Byte and Time):
1463735418 Bytes: 0 Time: 4.297
This is the code below (used this)
string writePath = @"C:\final.txt";
string[] lines = File.ReadAllLines(@"C:\union.dat");
foreach (string txt in lines)
{
string re1 = ".*?"; // Non-greedy match on filler
string re2 = "\\d+"; // Uninteresting: int
string re3 = ".*?"; // Non-greedy match on filler
string re4 = "(\\d+)"; // Integer Number 1
string re5 = ".*?"; // Non-greedy match on filler
string re6 = "([+-]?\\d*\\.\\d+)(?![-+0-9\\.])"; // Float 1
Regex r = new Regex(re1 + re2 + re3 + re4 + re5 + re6, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String int1 = m.Groups[1].ToString();
String float1 = m.Groups[2].ToString();
Debug.Write("(" + int1.ToString() + ")" + "(" + float1.ToString() + ")" + "\n");
File.AppendAllText(writePath, int1.ToString() + ", " + float1.ToString() + Environment.NewLine);
}
}
This works perfectly when the string is represented as a line, however when I try to read my file which is like this .
1463735418
Bytes: 0
Time: 4.297
1463735424
Time: 2.205
1466413696
Time: 2.225
1466413699
1466413702
1466413705
1466413708
1466413711
1466413714
1466413717
1466413720
Bytes: 7037
Time: 59.320
... (arbritrary repition)
I get garbage data.
Expected Output:
0, 4.297
7037, 59.320
(match only where time-bytes pair exist)
Edit: I am trying something like this, but still I don't get the desired result.
foreach (string txt in lines)
{
if (txt.StartsWith("Byte"))
{
string re1 = ".*?"; // Non-greedy match on filler
string re2 = "(\\d+)"; // Integer Number 1
Regex r = new Regex(re1 + re2, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String int1 = m.Groups[1].ToString();
//Console.Write("(" + int1.ToString() + ")" + "\n");
httpTable += int1.ToString() + ",";
}
}
if (txt.StartsWith("Time"))
{
string re3 = ".*?"; // Non-greedy match on filler
string re4 = "([+-]?\\d*\\.\\d+)(?![-+0-9\\.])"; // Float 1
Regex r1 = new Regex(re3 + re4, RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match m1 = r1.Match(txt);
if (m1.Success)
{
String float1 = m1.Groups[1].ToString();
//Console.Write("(" + float1.ToString() + ")" + "\n");
httpTable += float1.ToString() + Environment.NewLine;
}
}
}
How can I mend it? Thanks.
Upvotes: 2
Views: 1236
Reputation: 31721
I recommend that a lookbehind qualifies the time and bytes and if not found defaults into the integer category. Then by using regex named captures determine what has been found for each match.
string data = "1463735418 Bytes: 0 Time: 4.297 1463735424 Time: 2.205 1466413696 Time: 2.225 1466413699 1466413702 1466413705 1466413708 1466413711 1466413714 1466413717 1466413720 Bytes: 7037 Time: 59.320";
string pattern = @"
(?<=Bytes:\s)(?<Bytes>\d+) # Lookbehind for the bytes
| # Or
(?<=Time:\s)(?<Time>[\d.]+) # Lookbehind for time
| # Or
(?<Integer>\d+) # most likely its just an integer.
";
Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select(mt => new
{
IsInteger = mt.Groups["Integer"].Success,
IsTime = mt.Groups["Time"].Success,
IsByte = mt.Groups["Bytes"].Success,
strMatch = mt.Groups[0].Value,
AsInt = mt.Groups["Integer"].Success ? int.Parse(mt.Groups["Integer"].Value) : -1,
AsByte = mt.Groups["Bytes"].Success ? int.Parse(mt.Groups["Bytes"].Value) : -1,
AsTime = mt.Groups["Time"].Success ? double.Parse(mt.Groups["Time"].Value) : -1.0,
})
Here is the result which are an IEnumerable of each match as a dynamic entity with three IsA
s and their corresponding As
converted values if viable:
Upvotes: 3
Reputation: 80647
Since you only need values for Bytes: ...
and Time: ...
, use the exact strings, instead of fillers:
Bytes
Bytes: (\d+)
Time
Time: ([-+]\d*\.\d+)
(Bytes|Time): (\d+|[-+]\d*\.\d+)
Upvotes: 0