Reputation: 125
I have lots of strings, that all start like CIDR IP addresses (e.g. 192.168.1.104), but some of them have random letters at the end (e.g. 192.168.1.104kadjwneqb). Is there a way to split these strings at the first ocurrence of a letter, without using regex? Regex are too intensive to compute because I need to process a lot of these. Thank you in advance
Upvotes: 0
Views: 70
Reputation: 48686
Probably the easiest way to do this using no RegEx or loops is something like this:
using System;
public class Program
{
public static void Main()
{
string inputIp="192.168.12.127fjieif34f";
int firstNumber = inputIp.IndexOfAny("0123456789".ToCharArray());
int firstAlpha = inputIp.IndexOfAny("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".ToCharArray(), firstNumber);
string ip = inputIp.Substring(firstNumber, firstAlpha - firstNumber);
Console.WriteLine("The IP is " + ip);
}
}
This is how this works:
Substring
to extract the ip nestled between firstNumber
and firstAlpha
.This simple example doesn't do any kind of checking, which you might want to do (such as checking the return value of IndexOfAny
).
Upvotes: 1
Reputation: 109567
According to my testing, a custom loop is around five times faster than the regex you're using (although there are likely regexes that could be a bit faster).
I tested using BenchmarkDotNet, the result being:
| Method | Mean | Error | StdDev |
|-------------------------------------- |----------:|----------:|----------:|
| BenchTruncateAtLastDigitViaCustomCode | 4.846 ms | 0.0531 ms | 0.0652 ms |
| BenchTruncateAtLastDigitViaRegex | 21.886 ms | 0.2421 ms | 0.2265 ms |
And the test code:
using System;
using System.Collections.Generic;
using System.Net;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
namespace CoreConsoleA
{
public class UnderTest
{
public UnderTest()
{
var rng = new Random(12456); // Seeded RNG, so same data every run.
// Test with 100,000 IP addresses of which 10% have between 1 and 20 extra characters.
var data = new byte[4];
int n = 100_000;
for (int i = 0; i < n; ++i)
{
rng.NextBytes(data); // Create a random
IPAddress ipAddress = new IPAddress(data);
if (rng.NextDouble() < 0.1)
{
int extra = rng.Next(1, 21);
_strings.Add(ipAddress + new string('x', extra));
}
else
{
_strings.Add(ipAddress.ToString());
}
}
}
[Benchmark]
public void BenchTruncateAtLastDigitViaCustomCode()
{
foreach (var s in _strings)
{
TruncateAtLastDigitViaCustomCode(s);
}
}
[Benchmark]
public void BenchTruncateAtLastDigitViaRegex()
{
foreach (var s in _strings)
{
TruncateAtLastDigitViaRegex(s);
}
}
public string TruncateAtLastDigitViaCustomCode(string s)
{
return s.Substring(0, IndexOfLastDigitViaCustomCode(s));
}
public string TruncateAtLastDigitViaRegex(string s)
{
return s.Substring(0, IndexOfLastDigitViaRegex(s));
}
public static int IndexOfLastDigitViaCustomCode(string s)
{
for (int i = 0; i < s.Length; ++i)
{
char c = s[i];
if (!char.IsDigit(c) && c != '.')
return i;
}
return s.Length;
}
public int IndexOfLastDigitViaRegex(string s)
{
int index = _ipTruncate.Match(s).Index;
return index > 0 ? index : s.Length;
}
readonly Regex _ipTruncate = new Regex("[^0-9.]", RegexOptions.Compiled);
readonly List<string> _strings = new List<string>();
}
}
Upvotes: 2
Reputation: 26782
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
Several comments claiming that regex should be fine, so I tested. Compiled in release mode, on my machine the regex approach is 10x slower. May still be ok, depending on the context of course. But IMO the naive implementation (which just looks for the non-digit, non-decimal point character in the string, then returns a substring) is also way simpler to understand. YMMV.
static class Program
{
static void Main(string[] args)
{
var input = "192.168.1.104kadjwneqb";
Timed(() => GetIp1(input));
Timed(() => GetIp2(input));
}
static Regex regex = new Regex(@"^(?<cidr>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})[a-zA-Z]*$", RegexOptions.Compiled);
static void Timed(Action a)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < 10_000_000; i++)
a();
Console.WriteLine(sw.ElapsedMilliseconds);
}
static string GetIp1(string input)
{
int i = 0;
while (char.IsDigit(input[i]) || input[i] == '.') i++;
return input.Substring(0, i);
}
static string GetIp2(string input)
{
var m = regex.Match(input);
return m.Groups["cidr"].Value;
}
}
Upvotes: 1