Reputation: 251
I have a giant xml and i am supposed to search for specific strings in it.
Strings must be 13 letters long and has to be all numeric.
I believe using regex is suitable for such application but my knowledge of regex is limited so that any kind of example would be useful.
Also what other approaches can be used for such string search applications.
<field name="TKT">
<item>
<index>1</index>
<text>Y24AUGXLOWS 2352159617737</text>
</item>
</field>
<field name="AP FAX">
<item>
<index>1</index>
<text>1 S1 SSRTKNETKHK1 2352159617737C1</text>
</item>
</field>
This is example of part of the xml im talking about. For instance I would like to extract number "2352159617737".
Thank you.
Upvotes: 0
Views: 1631
Reputation: 3282
If you want to go the regex-way you could use this expression:
[^\d](\d{13})[^\d]
This one only grabs 13-character long numbers.
Updated with your xml-code
Shortened the expression
Upvotes: 2
Reputation: 11228
@"\d{13}?"
will give you 13-digit numbers:
XDocument doc = XDocument.Load(filePath);
var numbers = doc.Root.DescendantNodes().OfType<XText>()
.Where(t => Regex.IsMatch(t, @"\d{13}?"))
.Select(t => Regex.Match(t, @"\d{13}?").Value)
.ToList();
Upvotes: 1
Reputation: 817
You can use [^\d](\d{13})[^\d]
regular expression to validate your string. If you want to change the string length in the regular expression then just put what ever you want in the place of 13
Upvotes: 1
Reputation: 626932
If you only expect to get the numbers from <text>
tags and other tag may also contain similar numbers, but you want to avoid matching them, use regex with XML parser. Here is an XElement-based solution:
var xml = "<field name=\"TKT\"> - <item> <index>1</index> <text>Y24AUGXLOWS 2352159617737</text> </item> </field> - <field name=\"AP FAX\"> - <item> <index>1</index> <text>1 S1 SSRTKNETKHK1 2352159617737C1</text> </item> </field>";
var xe = XElement.Parse("<root>" + xml + "</root>");
var res = xe.Descendants("text").Select(p => p.Value).ToList();
var numbers = new List<string>();
foreach (var tag in res)
{
numbers.AddRange(Regex.Matches(tag, @"(?<!\d)\d{12}(?!\d)").Cast<Match>().Select(n => n.Value).ToList());
}
With any regex that deals with "number" extraction, you should understand its boundaries and use it according to your needs:
\d{13}
will fetch you 13-digit sequences even if they are part of a longer numeric sequence (1234567890123456
will give you 1234567890123
)(?<!\d)\d{13}(?!\d)
will get you all 13-digit sequences if not followed or preceded with a digit (thus, A1234567890123B
is a valid match)\b\d{13}\b
will only match if enclosed with non-word characters (only ,1234567890123;
-like strings are valid matches.Upvotes: 2
Reputation: 11233
You may also try this expression:
\b(\d{13})\b
Note that it will capture all 13 digit text from your xml, if you specifically want to target <text>
node it is also possible through xpath query. An example taken from here:
root.SelectNodes("html/body/.//*[(name() !='script') and (name()!='style')]/text()[string-length() > 200]")
Upvotes: 1
Reputation: 2629
C4ud3x answer actually is right, but I think it can be done also like this: ([0-9]{13})
Upvotes: 0