Reputation: 751

Parse multiple XML tags using regex

I need to parse a few tags (their value) from am XML. This must be done by regex (don't ask me why :) )

For example:

 <name>AAA</name>
 <id>1234</id>
 <gender>M</gender>

I know the pattern needed for each tag using regex

string name =  "(?<=<name>).+?(?=</name>)";
string id = "(?<=<id>).+?(?=</id>)";
string gender = "(?<=<gender>).+?(?=</gender>)";

I just don't know how to init the Regex object to handle all of them.

I can do:

private static readonly Regex rgx1 = new Regex(name);
private static readonly Regex rgx2 = new Regex(id);
private static readonly Regex rgx3r = new Regex(gender);

but I'm guessing that's a terrible waste....

So my question is: how to init a single Regex to handle multiple patterns?

And once I did it, how to extract the values from it?

p.s: I'm programming in C# if anyone need to know....

10x alot !

Upvotes: 1

Answers (4)

Koder101

Reputation: 892

A more Generic Solution where you don't even have to know the Tags of the XML in advance :

static void Main(string[] args)
    {
        string path = @"C:\TestFile.xml";
        string input = File.ReadAllText(path);

        string pattern = @"<(.*)>(.*)</\1>";

        foreach (Match m in Regex.Matches(input, pattern))
        {
            System.Console.WriteLine(m.Groups[2].Value);
            System.Console.WriteLine("\n");
        }
    }

Use (m.Groups[1].Value) to get the name of XML Tag.

Upvotes: 2

Michael Kay

Reputation: 163675

You can't expect the kind of person who answers questions on this list to accept "don't ask me why" as a constraint. No self-respecting software engineer would accept a demand to use the wrong design for the task without first asking why.

Upvotes: 2

Dominic Cronin

Reputation: 6201

You say "don't ask me why", but I'm afraid I'm going to invoke programmer's prerogative and ask you why. If nothing else, because the solution will vary based on what the actual problem is. So for example, even using regexes, if you take misha's example (assuming it's fixed up to process the whitespace between the elements properly), it will only work on exactly the XML you posted.

In other words, with XML like this:

<name>AAA</name>
<id>1234</id>

the match would fail.

The purpose of XML is to allow for generic processing of this kind of data. Now sure, you can fix up the regex to make sure it deals with a missing gender tag, but if your real-world case is even a little more complex than your example, you will end up with a very complex regex indeed, and the responsibility for ensuring it performs well will fall on you. (Good quality modern XML parsers are highly tuned for good performance.)

So there you have it: to answer your question properly, we need to know the actual problem, and in this context, a constraint such as "you must use regexes" is quite interesting.

Say for example, that the XML in question isn't actually well-formed XML, so an XML parser would fall at the first hurdle. Knowing this would allow us to question whether the problem could be broken down into simpler parts, such as first extracting a well-formed XML fragment.

There could be other reasons, but whatever the reason is, it is crucial to the solution. Please share.

Upvotes: 2

misha

Reputation: 2889

You can try this:

  string input = @" <name>AAA</name>
                                <id>1234</id>
                                <gender>M</gender>";
          string pattern = @"<name>(?<name>.+)</name>
                                <id>(?<id>.+)</id>
                                <gender>(?<gender>.+)</gender>";
          Match m = Regex.Match(input, pattern);
          Console.WriteLine(m.Groups["name"]);
          Console.WriteLine(m.Groups["id"]);
          Console.WriteLine(m.Groups["gender"]);

Upvotes: 3

Parse multiple XML tags using regex

Answers (4)

Related Questions