Reputation: 2243
I have a string like
string Text = "012345678901234567890123456789";
and a List<int>
with indexes
List<int> Indexes = new List<int>() { 2, 4, 7, 9, 15, 18, 23, 10, 1, 2, 15, 40 };
with following restrictions
Text.length
what's the best way to remove characters from the text which are in the index list?
expected output:
035681234679012456789
Is there a more efficent way than
foreach (int index in Indexes
.OrderByDescending(x => x)
.Distinct()
.Where(x => x < Text.Length))
{
Text = Text.Remove(index, 1);
}
Update:
Here are the Benchmarks of the current answers (string
with 100.000 characters and List<int>
with length 10.000:
Gallant: 3.322 ticks
Tim Schmelter: 8.602.576 ticks
Sergei Zinovyev: 9.002 ticks
rbaghbanli: 7.137 ticks
Jirí Tesil Tesarík: 72.580 ticks
Upvotes: 14
Views: 1090
Reputation: 11
A modified solution using byte (may be replaced by boolean) array instead of hash table. PROS: linear complexity, CONS: needs extra memory for flag array.
string Text = "012345678901234567890123456789";
List<int> Indexes = new List<int>() { 2, 4, 7, 9, 15, 18, 23, 10, 1, 2, 15, 40 };
byte[] contains = new byte[Text.Length];
Indexes.ForEach(p=> {if ( p<Text.Length) contains[p]=1;});
var output = string.Concat(Enumerable.Range(0, Text.Length).Where(p => contains[p] != 1).Select(p => Text[p]));
Upvotes: 0
Reputation: 166
The following is making an assumption that your string contains a known set of characters. If you know for certain that, for example, Unicode character
never appears in the string, you can use it as a placeholder to mark which characters to remove. This should be very fast in exchange for this limitation:
char temp = '\uFFF0';
StringBuilder sb = new StringBuilder(Text);
for (int i = 0; i < Indexes.Count; i++)
{
if (Indexes[i] < sb.Length)
{
sb[Indexes[i]] = temp;
}
}
Text = sb.Replace(temp.ToString(), null).ToString();
This appears to be anywhere between 3-4 times faster than building a HashSet like some other answers have suggested. http://ideone.com/mUILHg
If you cannot make the above assumption, you can build an array to contain this extra bit of data instead of using a unique character. This does two rounds of iteration (so it's a bit slower), but it is still O(n) efficiency (so it should typically be faster than putting Indexes into a hashmap before iterating).
bool[] exclude = new bool[Text.Length];
for (int i = 0; i < Indexes.Count; i++)
{
if (Indexes[i] < exclude.Length)
{
exclude[Indexes[i]] = true;
}
}
StringBuilder sb = new StringBuilder(Text.Length);
for (int i = 0; i < Text.Length; i++)
{
if (!exclude[i])
{
sb.Append(Text[i]);
}
}
Text = sb.ToString();
Quick benchmarks: http://ideone.com/3d2uPH
Upvotes: 5
Reputation: 1286
This will work faster:
string Text = "012345678901234567890123456789";
List<int> Indexes = new List<int>() { 2, 4, 7, 9, 15, 18, 23, 10, 1, 2, 15, 40 };
HashSet<int> hashSet = new HashSet<int>(Indexes);
StringBuilder sb = new StringBuilder(Text.Length);
for (int i = 0; i < Text.Length; ++i)
{
if (!hashSet.Contains(i))
{
sb.Append(Text[i]);
}
}
string str = sb.ToString();
Upvotes: 9
Reputation: 3319
Yes, see code below (it will iterate only once over each sequence):
var map = new short[Text.Length];
foreach (var i in Indexes)
{
if (i < text.Count)
map[i] = 1;
}
Text = new string(Text.Where((c, i) => map[i] == 0).ToArray());
Upvotes: 7
Reputation: 460018
Here's a more or less elegant LINQ way:
Text = new string(Text.Where((c, index) => !Indexes.Contains(index)).ToArray());
It uses the overload of Enumerable.Where
that projects the index of the item in the sequence.
If you want the most efficient and not the most readable way and the text is really large you could use a HashSet<int>
instead of the list which doesn't allow duplicates and a StringBuilder
to create the new string:
var indexSet = new HashSet<int>(Indexes); // either create from the list(as shown here) or use it without your list
var textBuilder = new StringBuilder(Text.Length);
for(int i = 0; i < Text.Length; i++)
if (!indexSet.Contains(i))
textBuilder.Append(Text[i]);
Text = textBuilder.ToString();
Of course you could also use the HashSet<int>
in the LINQ approach to make it more efficient.
Upvotes: 11