Ralms
Ralms

Reputation: 532

Regex groups expression not capturing content

I'm trying to create a large regex expression where the plan is to capture 6 groups. Is gonna be used to parse some Android log that have the following format:

2020-03-10T14:09:13.3250000 VERB    CallingClass    17503   20870   Whatever content: this log line had (etc)

The expression I've created so far is the following:

    (\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(\w{+})\t(\d{5})\t(\d{5})\t(.*$)

The lines in this case are Tab separated, although the application that I'm developing will be dynamic to the point where this is not always the case, so regex I feel is still the best option even if heavier then performing a split.

Breaking down the groups in more detail from my though process:

  1. Matches the date (I'm considering changing this to a x number of characters instead)

    (\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})

  2. Match a block of 4 characters

    ([A-Za-z]{4})

  3. Match any number of characters until the next tab

    (\w{+})

  4. Match a block of 5 numbers 2 times

    \t(\d{5})

  5. At last, match everything else until the end of the line. \t(.*$)

If I use a reduced expression to the following it works:

    (\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(.*$)

This doesn't include 3 of the groups, the word and the 2 numbers blocks.

Any idea why is this?

Thank you.

Upvotes: 0

Views: 61

Answers (1)

juharr
juharr

Reputation: 32296

The problem is \w{+} is going to match a word character followed by one or more { characters and then a final } character. If you want one or more word characters then just use plus without the curly braces (which are meant for specifying a specific number or number range, but will match literal curly braces if they do not adhere to that format).

(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(\w+)\t(\d{5})\t(\d{5})\t(.*$)

I highly recommend using https://regex101.com/ for the explanation to see if your expression matches up with what you want spelled out in words. However for testing for use in C# you should use something else like http://regexstorm.net/tester

Upvotes: 3

Related Questions