Çağdaş Umay
Çağdaş Umay

Reputation: 251

C# search for specific string with specific length

I have a giant xml and i am supposed to search for specific strings in it.

Strings must be 13 letters long and has to be all numeric.

I believe using regex is suitable for such application but my knowledge of regex is limited so that any kind of example would be useful.

Also what other approaches can be used for such string search applications.

  <field name="TKT">
    <item>
      <index>1</index> 
        <text>Y24AUGXLOWS 2352159617737</text> 
    </item>
  </field>
  <field name="AP FAX">
    <item>
      <index>1</index> 
        <text>1 S1 SSRTKNETKHK1 2352159617737C1</text> 
    </item>
  </field>

This is example of part of the xml im talking about. For instance I would like to extract number "2352159617737".

Thank you.

Upvotes: 0

Views: 1631

Answers (6)

C4d
C4d

Reputation: 3282

If you want to go the regex-way you could use this expression:

[^\d](\d{13})[^\d]

This one only grabs 13-character long numbers.

Regex101 Fiddle

Updated with your xml-code
Shortened the expression

Upvotes: 2

w.b
w.b

Reputation: 11228

@"\d{13}?" will give you 13-digit numbers:

XDocument doc = XDocument.Load(filePath);

var numbers = doc.Root.DescendantNodes().OfType<XText>()
                      .Where(t => Regex.IsMatch(t, @"\d{13}?"))
                      .Select(t => Regex.Match(t, @"\d{13}?").Value)
                      .ToList();

Upvotes: 1

rafat
rafat

Reputation: 817

You can use [^\d](\d{13})[^\d] regular expression to validate your string. If you want to change the string length in the regular expression then just put what ever you want in the place of 13

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626932

If you only expect to get the numbers from <text> tags and other tag may also contain similar numbers, but you want to avoid matching them, use regex with XML parser. Here is an XElement-based solution:

var xml = "<field name=\"TKT\"> - <item> <index>1</index> <text>Y24AUGXLOWS 2352159617737</text> </item> </field> - <field name=\"AP FAX\"> - <item> <index>1</index> <text>1 S1 SSRTKNETKHK1 2352159617737C1</text> </item> </field>";
var xe = XElement.Parse("<root>" + xml + "</root>");
var res = xe.Descendants("text").Select(p => p.Value).ToList();
var numbers = new List<string>();
foreach (var tag in res)
{ 
    numbers.AddRange(Regex.Matches(tag, @"(?<!\d)\d{12}(?!\d)").Cast<Match>().Select(n => n.Value).ToList());
}

enter image description here

With any regex that deals with "number" extraction, you should understand its boundaries and use it according to your needs:

  • \d{13} will fetch you 13-digit sequences even if they are part of a longer numeric sequence (1234567890123456 will give you 1234567890123)
  • (?<!\d)\d{13}(?!\d) will get you all 13-digit sequences if not followed or preceded with a digit (thus, A1234567890123B is a valid match)
  • \b\d{13}\b will only match if enclosed with non-word characters (only ,1234567890123;-like strings are valid matches.

Upvotes: 2

NeverHopeless
NeverHopeless

Reputation: 11233

You may also try this expression:

\b(\d{13})\b

Demo

Note that it will capture all 13 digit text from your xml, if you specifically want to target <text> node it is also possible through xpath query. An example taken from here:

root.SelectNodes("html/body/.//*[(name() !='script') and (name()!='style')]/text()[string-length() > 200]")

Upvotes: 1

ivamax9
ivamax9

Reputation: 2629

C4ud3x answer actually is right, but I think it can be done also like this: ([0-9]{13})

Upvotes: 0

Related Questions