IzoDark Izo
IzoDark Izo

Reputation: 51

Retrieving text from XML with C#

I have a text file that contains this:

<Person>
    <Prenom>Jack</Prenom>
    <Nom>Jhon</Nom>
    <Adresse>4 rue de la Mélandine</Adresse>
    <Tél></Tél>
    <Email>[email protected]</Email>
    <PhotoPath>c:\Program Files\Zonedetec\Gestionnaire de tâche v2\Img\5295f1ea-372a-4f2f-8f32-c52e8a48cc0839105.png</PhotoPath>
    <Age>19</Age>
    <Id>4640434</Id>
</Person>
<Person>
    <Prenom>Jean</Prenom>
    <Nom>Delamar</Nom>
    <Adresse>13 rue de la Mélandine</Adresse>
    <Tél></Tél>
    <Email>[email protected]</Email>
    <PhotoPath>c:\Program Files\Zonedetec\Gestionnaire de tâche v2\Img\5295f1ea-372a-4f2f-8f32-c52e8a48cc0839105.png</PhotoPath>
    <Age>19</Age>
    <Id>4640434</Id>
</Person>

I would like to retrieve all the values between the tags For example, in a list, I would like to retrieve the values (here 2) between and

How could I do this?

I tried this:

internal static void LoadPerson()
    {
        string data = File.ReadAllText(Main.PersonnePath);

        Regex regex = new Regex("<Person>(.*)</Person>");
        var v = regex.Match(data);
        string s = v.Groups[1].ToString();

        MessageBox.Show(s);
    }

Except that s contains nothing at all

Can you help me? Thank you.

Upvotes: 2

Views: 709

Answers (2)

Pavel Anikhouski
Pavel Anikhouski

Reputation: 23228

Since your file has an XML format, you can use XmlSerializer for reading that, it's less painful, than parse it manually

Create a Person class first (or generate using Edit -> Paste special -> Paste XML as classes in Visual Studio)

[Serializable]
public class Person
{
    private string _prenomField;
    private string _nomField;
    private string _adresseField;
    private object _télField;
    private string _emailField;
    private string _photoPathField;
    private byte _ageField;
    private uint _idField;

    public string Prenom
    {
        get => _prenomField;
        set => _prenomField = value;
    }

    public string Nom
    {
        get => _nomField;
        set => _nomField = value;
    }

    public string Adresse
    {
        get => _adresseField;
        set => _adresseField = value;
    }

    public object Tél
    {
        get => _télField;
        set => _télField = value;
    }

    public string Email
    {
        get => _emailField;
        set => _emailField = value;
    }

    public string PhotoPath
    {
        get => _photoPathField;
        set => _photoPathField = value;
    }

    public byte Age
    {
        get => _ageField;
        set => _ageField = value;
    }

    public uint Id
    {
        get => _idField;
        set => _idField = value;
    }
}

Than update a structure of file a little bit (you have to have one root tag)

<?xml version="1.0" encoding="utf-8" ?>
<people>
  <Person>
    <Prenom>Jack</Prenom>
    <Nom>Jhon</Nom>
    <Adresse>4 rue de la Mélandine</Adresse>
    <Tél></Tél>
    <Email>[email protected]</Email>
    <PhotoPath>c:\Program Files\Zonedetec\Gestionnaire de tâche v2\Img\5295f1ea-372a-4f2f-8f32-c52e8a48cc0839105.png</PhotoPath>
    <Age>19</Age>
    <Id>4640434</Id>
  </Person>
  <Person>
    <Prenom>Jean</Prenom>
    <Nom>Delamar</Nom>
    <Adresse>13 rue de la Mélandine</Adresse>
    <Tél></Tél>
    <Email>[email protected]</Email>
    <PhotoPath>c:\Program Files\Zonedetec\Gestionnaire de tâche v2\Img\5295f1ea-372a-4f2f-8f32-c52e8a48cc0839105.png</PhotoPath>
    <Age>19</Age>
    <Id>4640434</Id>
  </Person>
</people>

and finally parse it

var mySerializer = new XmlSerializer(typeof(Person[]), new XmlRootAttribute("people"));
Person[] people;
using (var fileStream = new FileStream(Main.PersonnePath, FileMode.Open))
{
    people = (Person[])mySerializer.Deserialize(fileStream);
}

Don't forget to add using System.Xml.Serialization; namespace. After deserialization people array will contain all values you need, you can format them to any string/whatever you want. The best option here is override ToString() method of Person class to get required string representation of object

Upvotes: 3

Lutti Coelho
Lutti Coelho

Reputation: 2264

If you only need this values as plain text. you can use Regular Expression or XMLSerializer or (Linq to XML).

What you need to analyse before choose one approach or the other is:

1) What I need to do with this?

1.a) If you only needs the plain text inside each tag. And you will not do any validation / calc / re-parser. You can use both methods in a easy way.

1.a.1) Using Regular Expression:

    public List<string> GetValueByRegex(string input)
    {
        string pattern = @"<Person>([\s\S]*?)</Person>";

        var matches = Regex.Matches(input, pattern);

        if (matches.All(m => !m.Success))
            return null;

        var result = new List<string>();
        foreach (Match match in matches)
        {
            result.Add(match.Groups[1].Value);
        }
        return result;
    }

1.a.2) Use XDocument to parse Xml string

Important: XDocument requires that your XML have one root Tag to work. As Your XML has two root Tags. I forced it with string interpolation $"<root>{input}</root>"

    public List<string> GetValueByXmlParse(string input)
    {
        var result = new List<string>();
        var ensureThereAreOnlyOneRootTag = $"<root>{input}</root>";

        XDocument xmlDocument = XDocument.Parse(ensureThereAreOnlyOneRootTag);
        foreach(var personXml in xmlDocument.Root.Elements("Person"))
        {
            result.Add(String.Concat(personXml.Nodes()));
        }
        return result;
    }

1.b) If you will do any thing with the data you extract from your XML should be better to parse it to an object.

You can make Visual Studio generate one by copy the XML value and click in Edit > Paste Special > Paste XML As Classes.

@PavelAnikhouski already share a good example for that.

2) I really need a good performance for that?

To answer that I use a Benchmark nuget package to compare all options. This is the result:

|                Method |    Gen 0 | Allocated |
|---------------------- |---------:|----------:|
|       GetValueByRegex |   1.2207 |    2688 B |
|    GetValueByXmlParse | 115.6006 |  243536 B |

Gen 0 : GC Generation 0 collects per 1000 operations

Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)

So, the answer is: Depends on what you need to do with the result of that. I hope I could help you to decide.

Best Regards

Upvotes: 4

Related Questions