Reputation: 785
NOTE: I did ask the same question here but since some people have marked it as duplicate though it had some crafty, neat solutions, I had to create this extra(dupe) question to make it easier for others who are facing similar doubts. Added the question based on the suggestion of fellow stack overflow members.
What is the efficient way to parse through a large delimited string so that I can access just one element from the delimited set without having to store the other substrings involved?
I specifically am not interested in storing the rest of the element values as done when using Split() method since all of this information is irrelevant to the problem at hand. Also, I want to save memory in doing the same.
Problem Statement:
Given the exact delimited position, I need to extract the element contained in that given position in the most efficient way in terms of memory consumed and time taken.
Simple example string: "1,2,3,4,....,21,22,23,24"
Delimter: ,
Delimited Position: 22
Answer expected: 23
Another example string:
"61d2e3f6-bcb7-4cd1-a81e-4f8f497f0da2;0;192.100.0.102:4362;2014-02-14;283;0;354;23;0;;;""0x8D15A2913C934DE"";Thursday, 19-Jun-14 22:58:10 GMT;"
Delimiter: ;
Delimited Position: 7
Expected Answer: 23
Upvotes: 2
Views: 738
Reputation: 52240
If you want to be sure the code parses the string in only one pass, and only parses what is needed, you can write the routine that iterates over the string yourself.
Since all c# strings implement IEnumerable<char>
it is fairly straightforward to devise a method that requires zero string allocations:
static public IEnumerable<char> GetDelimitedField(this IEnumerable<char> source, char delimiter, int index)
{
foreach (var c in source)
{
if (c == delimiter)
{
if (--index < 0) yield break;
}
else
{
if (index == 0) yield return c;
}
}
}
This returns the result as an IEnumerable<char>
but it's cheap to convert to a string. It's going to be a much shorter string at this point anyway.
static public string GetDelimitedString(this string source, char delimiter, int index)
{
var result = source.GetDelimitedField(delimiter, index);
return new string(result.ToArray());
}
And you can call it like this:
var input ="Zero,One,Two,Three,Four,Five,Six";
var output = input.GetDelimitedString(',',5);
Console.WriteLine(output);
Output:
Five
Upvotes: 1
Reputation: 1073
Too late for "answer" but this code gives me a run time of about 0.75 seconds with both strings processed 1,000,000 times. Difference this time is that now I'm not Marshaling an object but using pointers.
And this time I am returning a single new string (String.Substring).
using System;
using System.Diagnostics;
using System.Runtime.InteropServices;
class Program
{
static void Main(string[] args)
{
string testString1 = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24";
string testString2 = "61d2e3f6-bcb7-4cd1-a81e-4f8f497f0da2;0;192.100.0.102:4362;2014-02-14;283;0;354;23;0;;;\"0x8D15A2913C934DE\";Thursday, 19-Jun-14 22:58:10 GMT;";
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 1; i < 1000000; i++)
{
Delimit(testString1, ',', 22);
Delimit(testString2, ';', 6);
}
sw.Stop();
Console.WriteLine($"==>{sw.ElapsedMilliseconds}");
Console.ReadLine();
}
static string Delimit(string stringUnderTest, char delimiter, int skipCount)
{
const int SIZE_OF_UNICHAR = 2;
int i = 0;
int index = 0;
char c = Char.MinValue;
GCHandle handle = GCHandle.Alloc(stringUnderTest, GCHandleType.Pinned);
try
{
IntPtr ptr = handle.AddrOfPinnedObject();
for (i = 0; i < skipCount; i++)
while ((char)Marshal.ReadByte(ptr, index += SIZE_OF_UNICHAR) != delimiter) ;
i = index;
while ((c = (char)Marshal.ReadByte(ptr, i += SIZE_OF_UNICHAR)) != delimiter) ;
}
finally
{
if (handle.IsAllocated)
handle.Free();
}
return stringUnderTest.Substring((index + SIZE_OF_UNICHAR) >> 1, (i - index - SIZE_OF_UNICHAR) >> 1);
}
}
Upvotes: -1
Reputation: 764
By using the following Regex : ^([^;]*;){21}(.*?);
, with that you don't have to generate the hole split list to search for your desired position, and once you reach it, it gonna be a matter of whether exists or not.
Explanation :
^ --> start of a line.
([^;]*;){Position - 1} --> notice that the symbol ; here is the delimiter, the expression will loop Pos - 1 times
(.*?) --> Non-Greedy .*
For more about regular expressions on C# : documentation
In the example below i did implemant the two samples to show you how it works.
Match Method : documentation (Basically it searchs only for the first occurence of the pattern) RegexOptions.Singleline : Treats the input as a signle line.
C# Code
Console.WriteLine("First Delimiter : ");
int Position = 22;
char delimiter = ',';
string pattern = @"^([^" + delimiter + "]*" + delimiter + "){" + (Position - 1) + @"}(.*?)" + delimiter;
Regex regex = new Regex(pattern, RegexOptions.Singleline);
// First Example
string Data = @"AAV,zzz,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22ABC,23,24,24";
Match Re = regex.Match(Data);
if (Re.Groups.Count > 0)
Console.WriteLine("\tMatch found : " + Re.Groups[2]);
// Second Example
Console.WriteLine("Second Delimiter : ");
Position = 8;
delimiter = ';';
pattern = @"^([^" + delimiter + "]*" + delimiter + "){" + (Position - 1) + @"}(.*?)" + delimiter;
Data = @"61d2e3f6-bcb7-4cd1-a81e-4f8f497f0da2;0;192.100.0.102:4362;2014-02-14;283;0;354;23;0;;;""0x8D15A2913C934DE"";Thursday, 19-Jun-14 22:58:10 GMT;";
regex = new Regex(pattern, RegexOptions.Singleline);
Re = regex.Match(Data);
if (Re.Groups.Count > 0)
Console.WriteLine("\tMatch found : " + Re.Groups[2]);
Output :
First Delimiter :
Match found : 22ABC
Second Delimiter :
Match found : 23
Upvotes: 1
Reputation: 30565
try this:
public static string MyExtension(this string s, char delimiter, int n)
{
var begin = n== 0 ? 0 : Westwind.Utilities.StringUtils.IndexOfNth(s, delimiter, n);
if (begin == -1)
return null;
var end = s.IndexOf(delimiter, begin + (n==0?0:1));
if (end == -1 ) end = s.Length;
//var end = Westwind.Utilities.StringUtils.IndexOfNth(s, delimiter, n + 1);
var result = s.Substring(begin +1, end - begin -1 );
return result;
}
PS: Library used is Westwind.Utilities
Benchmark Code:
void Main()
{
string s = string.Join(";", Enumerable.Range(65, 26).Select(c => (char)c));
s = s.Insert(3, ";;;");
string o = "";
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 1; i <= 1000000; i++) {
o = s.Split(';', 21);
}
sw.Stop();
Console.WriteLine("Item directly selected: " + sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 1; i <= 1000000; i++) {
o = s.MyExtension(';', 21);
}
sw.Stop();
Console.WriteLine("Item directly selected by MyExtension: " + sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 1; i <= 1000000; i++) {
o = s.Split(';')[21];
}
sw.Stop();
Console.WriteLine("Item from split array: " + sw.ElapsedMilliseconds + "\r\n");
Console.WriteLine(s);
Console.WriteLine(o);
}
public static class MyExtensions
{
/// <summary>
/// Get the nth item from a delimited string.
/// </summary>
/// <param name="s">The string to retrieve a delimited item from.</param>
/// <param name="delimiter">The character used as the item delimiter.</param>
/// <param name="n">Zero-based index of item to return.</param>
/// <returns>The nth item or an empty string.</returns>
public static string Split(this string s, char delimiter, int n)
{
int pos = pos = s.IndexOf(delimiter);
if (n == 0 || pos < 0)
{ return (pos >= 0) ? s.Substring(0, pos) : s; }
int nDelims = 1;
while (nDelims < n && pos >= 0)
{
pos = s.IndexOf(delimiter, pos + 1);
nDelims++;
}
string result = "";
if (pos >= 0)
{
int nextDelim = s.IndexOf(delimiter, pos + 1);
result = (nextDelim < 0) ? s.Substring(pos + 1) : s.Substring(pos + 1, nextDelim - pos - 1);
}
return result;
}
public static string MyExtension(this string s, char delimiter, int n)
{
var begin = n== 0 ? 0 : Westwind.Utilities.StringUtils.IndexOfNth(s, delimiter, n);
if (begin == -1)
return null;
var end = s.IndexOf(delimiter, begin + (n==0?0:1));
if (end == -1 ) end = s.Length;
//var end = Westwind.Utilities.StringUtils.IndexOfNth(s, delimiter, n + 1);
var result = s.Substring(begin +1, end - begin -1 );
return result;
}
}
Results:
Item directly selected: 277
Item directly selected by MyExtension: 114
Item from split array: 1297
A;B;;;;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;R;S;T;U;V;W;X;Y;Z
S
Edit: Thanks to @Kalten, I enhanced solution further. Considerable difference has been seen on benchmark results.
Upvotes: 2
Reputation: 25013
There are some useful remarks relevant to this problem in the documentation for String.Split, although I wrote the following before discovering that.
One way to do it is to find a delimiter with String.IndexOf method - you can specify the index to start the search from, so it is possible to skip along the items without having to examine every character. (The examination of every character happens behind the scenes, but it's a little bit faster than doing it yourself.)
I made up an extension method by adding a new class named "ExtensionMethods.cs" to the solution with this content:
namespace ExtensionMethods
{
public static class MyExtensions
{
/// <summary>
/// Get the nth item from a delimited string.
/// </summary>
/// <param name="s">The string to retrieve a delimited item from.</param>
/// <param name="delimiter">The character used as the item delimiter.</param>
/// <param name="n">Zero-based index of item to return.</param>
/// <returns>The nth item or an empty string.</returns>
public static string Split(this string s, char delimiter, int n)
{
int pos = pos = s.IndexOf(delimiter);
if (n == 0 || pos < 0)
{ return (pos >= 0) ? s.Substring(0, pos) : s; }
int nDelims = 1;
while (nDelims < n && pos >= 0)
{
pos = s.IndexOf(delimiter, pos + 1);
nDelims++;
}
string result = "";
if (pos >= 0)
{
int nextDelim = s.IndexOf(delimiter, pos + 1);
result = (nextDelim < 0) ? s.Substring(pos + 1) : s.Substring(pos + 1, nextDelim - pos - 1);
}
return result;
}
}
}
And a small program to test it:
using System;
using System.Diagnostics;
using System.Linq;
using ExtensionMethods;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
// test data...
string s = string.Join(";", Enumerable.Range(65, 26).Select(c => (char)c));
s = s.Insert(3, ";;;");
string o = "";
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 1; i <= 1000000; i++) {
o = s.Split(';', 21);
}
sw.Stop();
Console.WriteLine("Item directly selected: " + sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 1; i <= 1000000; i++) {
o = s.Split(';')[21];
}
sw.Stop();
Console.WriteLine("Item from split array: " + sw.ElapsedMilliseconds + "\r\n");
Console.WriteLine(s);
Console.WriteLine(o);
Console.ReadLine();
}
}
}
Sample output:
Item directly selected: 1016
Item from split array: 1345A;B;;;;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;R;S;T;U;V;W;X;Y;Z
S
Reference: How to: Implement and Call a Custom Extension Method (C# Programming Guide)
Upvotes: 2