Reputation: 2243

Remove chars from string

I have a string like

string Text = "012345678901234567890123456789";

and a List<int> with indexes

List<int> Indexes = new List<int>() { 2, 4, 7, 9, 15, 18, 23, 10, 1, 2, 15, 40 };

with following restrictions

there are duplicates within the list
the list is not sorted
there may be indexes > Text.length

what's the best way to remove characters from the text which are in the index list?

expected output:

035681234679012456789

Is there a more efficent way than

foreach (int index in Indexes
                        .OrderByDescending(x => x)
                        .Distinct()
                        .Where(x => x < Text.Length))
{
    Text = Text.Remove(index, 1);
}

Update: Here are the Benchmarks of the current answers (string with 100.000 characters and List<int> with length 10.000:

Gallant: 3.322 ticks
Tim Schmelter: 8.602.576 ticks
Sergei Zinovyev: 9.002 ticks
rbaghbanli: 7.137 ticks
Jirí Tesil Tesarík: 72.580 ticks

Upvotes: 14

Answers (5)

Jiří Tesil Tesařík

Reputation: 11

A modified solution using byte (may be replaced by boolean) array instead of hash table. PROS: linear complexity, CONS: needs extra memory for flag array.

string Text = "012345678901234567890123456789";
List<int> Indexes = new List<int>() { 2, 4, 7, 9, 15, 18, 23, 10, 1, 2, 15, 40 };
byte[] contains = new byte[Text.Length];
Indexes.ForEach(p=> {if ( p<Text.Length) contains[p]=1;});
var output = string.Concat(Enumerable.Range(0, Text.Length).Where(p => contains[p] != 1).Select(p => Text[p]));

Upvotes: 0

Gallant

Reputation: 166

The following is making an assumption that your string contains a known set of characters. If you know for certain that, for example, Unicode character ￰ never appears in the string, you can use it as a placeholder to mark which characters to remove. This should be very fast in exchange for this limitation:

char temp = '\uFFF0';
StringBuilder sb = new StringBuilder(Text);
for (int i = 0; i < Indexes.Count; i++)
{
    if (Indexes[i] < sb.Length)
    {
        sb[Indexes[i]] = temp;
    }
}

Text = sb.Replace(temp.ToString(), null).ToString();

This appears to be anywhere between 3-4 times faster than building a HashSet like some other answers have suggested. http://ideone.com/mUILHg

If you cannot make the above assumption, you can build an array to contain this extra bit of data instead of using a unique character. This does two rounds of iteration (so it's a bit slower), but it is still O(n) efficiency (so it should typically be faster than putting Indexes into a hashmap before iterating).

bool[] exclude = new bool[Text.Length];
for (int i = 0; i < Indexes.Count; i++)
{
    if (Indexes[i] < exclude.Length)
    {
        exclude[Indexes[i]] = true;
    }
}
StringBuilder sb = new StringBuilder(Text.Length);
for (int i = 0; i < Text.Length; i++)
{
    if (!exclude[i])
    {
        sb.Append(Text[i]);
    }
}
Text = sb.ToString();

Quick benchmarks: http://ideone.com/3d2uPH

Upvotes: 5

Sergei Zinovyev

Reputation: 1286

This will work faster:

string Text = "012345678901234567890123456789";
List<int> Indexes = new List<int>() { 2, 4, 7, 9, 15, 18, 23, 10, 1, 2, 15, 40 };

HashSet<int> hashSet = new HashSet<int>(Indexes);

StringBuilder sb = new StringBuilder(Text.Length);
for (int i = 0; i < Text.Length; ++i)
{
    if (!hashSet.Contains(i))
    {
        sb.Append(Text[i]);
    }
}

string str = sb.ToString();

Upvotes: 9

Riad Baghbanli

Reputation: 3319

Yes, see code below (it will iterate only once over each sequence):

var map = new short[Text.Length];
foreach (var i in Indexes)
{
    if (i < text.Count)
        map[i] = 1;
}
Text = new string(Text.Where((c, i) => map[i] == 0).ToArray());

Upvotes: 7

Tim Schmelter

Reputation: 460018

Here's a more or less elegant LINQ way:

Text = new string(Text.Where((c, index) => !Indexes.Contains(index)).ToArray());

It uses the overload of Enumerable.Where that projects the index of the item in the sequence.

If you want the most efficient and not the most readable way and the text is really large you could use a HashSet<int> instead of the list which doesn't allow duplicates and a StringBuilder to create the new string:

var indexSet = new HashSet<int>(Indexes); // either create from the list(as shown here) or use it without your list
var textBuilder = new StringBuilder(Text.Length);

for(int i = 0; i < Text.Length; i++)
    if (!indexSet.Contains(i))
        textBuilder.Append(Text[i]);
Text = textBuilder.ToString();

Of course you could also use the HashSet<int> in the LINQ approach to make it more efficient.

Upvotes: 11

Remove chars from string

Answers (5)

Related Questions