Jishan
Jishan

Reputation: 1684

Regex To Parse Int and String

So I created this regex to parse strings like this (I need the values for Byte and Time):

1463735418    Bytes: 0    Time: 4.297 

This is the code below (used this)

string writePath = @"C:\final.txt";
string[] lines = File.ReadAllLines(@"C:\union.dat");
foreach (string txt in lines)
{

    string re1 = ".*?"; // Non-greedy match on filler
    string re2 = "\\d+";    // Uninteresting: int
    string re3 = ".*?"; // Non-greedy match on filler
    string re4 = "(\\d+)";  // Integer Number 1
    string re5 = ".*?"; // Non-greedy match on filler
    string re6 = "([+-]?\\d*\\.\\d+)(?![-+0-9\\.])";    // Float 1

    Regex r = new Regex(re1 + re2 + re3 + re4 + re5 + re6, RegexOptions.IgnoreCase | RegexOptions.Singleline);
    Match m = r.Match(txt);
    if (m.Success)
    {
        String int1 = m.Groups[1].ToString();
        String float1 = m.Groups[2].ToString();
        Debug.Write("(" + int1.ToString() + ")" + "(" + float1.ToString() + ")" + "\n");
        File.AppendAllText(writePath, int1.ToString() + ", " + float1.ToString() + Environment.NewLine);

    }
}

This works perfectly when the string is represented as a line, however when I try to read my file which is like this .

1463735418
Bytes: 0
Time: 4.297
1463735424
Time: 2.205
1466413696
Time: 2.225
1466413699
1466413702
1466413705
1466413708
1466413711
1466413714
1466413717
1466413720
Bytes: 7037
Time: 59.320
... (arbritrary repition)

I get garbage data.

Expected Output: 
0, 4.297 
7037, 59.320

(match only where time-bytes pair exist)

Edit: I am trying something like this, but still I don't get the desired result.

foreach (string txt in lines)
            {

                if (txt.StartsWith("Byte"))
                {
                    string re1 = ".*?"; // Non-greedy match on filler
                    string re2 = "(\\d+)";  // Integer Number 1

                    Regex r = new Regex(re1 + re2, RegexOptions.IgnoreCase | RegexOptions.Singleline);
                    Match m = r.Match(txt);
                    if (m.Success)
                    {
                        String int1 = m.Groups[1].ToString();
                        //Console.Write("(" + int1.ToString() + ")" + "\n");
                        httpTable += int1.ToString() + ",";
                    }
                }
                if (txt.StartsWith("Time"))
                {
                    string re3 = ".*?"; // Non-greedy match on filler
                    string re4 = "([+-]?\\d*\\.\\d+)(?![-+0-9\\.])";    // Float 1

                    Regex r1 = new Regex(re3 + re4, RegexOptions.IgnoreCase | RegexOptions.Singleline);
                    Match m1 = r1.Match(txt);
                    if (m1.Success)
                    {
                        String float1 = m1.Groups[1].ToString();
                        //Console.Write("(" + float1.ToString() + ")" + "\n");
                        httpTable += float1.ToString() + Environment.NewLine;
                    }
                }

            }

How can I mend it? Thanks.

Upvotes: 2

Views: 1236

Answers (2)

ΩmegaMan
ΩmegaMan

Reputation: 31721

I recommend that a lookbehind qualifies the time and bytes and if not found defaults into the integer category. Then by using regex named captures determine what has been found for each match.

string data = "1463735418 Bytes: 0 Time: 4.297 1463735424 Time: 2.205 1466413696 Time: 2.225 1466413699 1466413702 1466413705 1466413708 1466413711 1466413714 1466413717 1466413720 Bytes: 7037 Time: 59.320";

string pattern = @"
    (?<=Bytes:\s)(?<Bytes>\d+)   # Lookbehind for the bytes
    |                            # Or
    (?<=Time:\s)(?<Time>[\d.]+)  # Lookbehind for time
    |                            # Or
    (?<Integer>\d+)              # most likely its just an integer.
    ";

Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
     .OfType<Match>()
     .Select(mt => new
                   {
                       IsInteger = mt.Groups["Integer"].Success,
                       IsTime = mt.Groups["Time"].Success,
                       IsByte = mt.Groups["Bytes"].Success,
                       strMatch = mt.Groups[0].Value,
                       AsInt  = mt.Groups["Integer"].Success ? int.Parse(mt.Groups["Integer"].Value) : -1,
                       AsByte = mt.Groups["Bytes"].Success ? int.Parse(mt.Groups["Bytes"].Value) : -1,
                       AsTime = mt.Groups["Time"].Success ? double.Parse(mt.Groups["Time"].Value) : -1.0,
                   })

Here is the result which are an IEnumerable of each match as a dynamic entity with three IsAs and their corresponding As converted values if viable:

enter image description here

Upvotes: 3

hjpotter92
hjpotter92

Reputation: 80647

Since you only need values for Bytes: ... and Time: ..., use the exact strings, instead of fillers:

For capturing Bytes

Bytes: (\d+)

For capturing Time

Time: ([-+]\d*\.\d+)

Generic pattern to capture both

(Bytes|Time): (\d+|[-+]\d*\.\d+)

Upvotes: 0

Related Questions