Reputation: 26611
Anyone know how I would find & replace text in a string? Basically I have two strings:
string firstS = "/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDABQODxIPDRQSERIXFhQYHzMhHxwcHz8tLyUzSkFOTUlBSEZSXHZkUldvWEZIZoxob3p9hIWET2ORm4+AmnaBhH//2wBDARYXFx8bHzwhITx/VEhUf39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f3//";
string secondS = "abcdefg2wBDABQODxIPDRQSERIXFh/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/abcdefg";
I want to search firstS
to see if it contains any sequence of characters that's in secondS
and then replace it. It also needs to be replaced with the number of replaced characters in squared brackets:
[NUMBER-OF-CHARACTERS-REPLACED]
For example, because firstS
and secondS
both contain "2wBDABQODxIPDRQSERIXFh" and "/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/" they would need to be replaced. So then firstS
becomes:
string firstS = "/9j/4AAQSkZJRgABAQEAYABgAAD/[22]QYHzMhHxwcHz8tLyUzSkFOTUlBSEZSXHZkUldvWEZIZoxob3p9hIWET2ORm4+AmnaBhH//2wBDARYXFx8bHzwhITx/VEhUf39[61]f3//";
Hope that makes sense. I think I could do this with Regex, but I don't like the inefficiency of it. Does anyone know of another, faster way?
Upvotes: 1
Views: 513
Reputation: 17508
This is probably dog slow, but if you're willing to incur some technical debt and need something now for prototyping, you could use LINQ.
string firstS = "123abc";
string secondS = "456cdeabc123";
int minLength = 3;
var result =
from subStrCount in Enumerable.Range(0, firstS.Length)
where firstS.Length - subStrCount >= 3
let subStr = firstS.Substring(subStrCount, 3)
where secondS.Contains(subStr)
select secondS.Replace(subStr, "[" + subStr.Length + "]");
Results in
456cdeabc[3]
456cde[3]123
Upvotes: 0
Reputation: 727017
Does anyone know of another, faster way?
Yes, this problem actually has a proper name. It is called the Longest Common Substring, and it has a reasonably fast solution.
Here is an implementation on ideone. It finds and replaces all common substrings of ten characters or longer.
// This comes straight from Wikipedia article linked above:
private static string FindLcs(string s, string t) {
var L = new int[s.Length, t.Length];
var z = 0;
var ret = new StringBuilder();
for (var i = 0 ; i != s.Length ; i++) {
for (var j = 0 ; j != t.Length ; j++) {
if (s[i] == t[j]) {
if (i == 0 || j == 0) {
L[i,j] = 1;
} else {
L[i,j] = L[i-1,j-1] + 1;
}
if (L[i,j] > z) {
z = L[i,j];
ret = new StringBuilder();
}
if (L[i,j] == z) {
ret.Append(s.Substring( i-z+1, z));
}
} else {
L[i,j]=0;
}
}
}
return ret.ToString();
}
// With the LCS in hand, building the answer is easy
public static string CutLcs(string s, string t) {
for (;;) {
var lcs = FindLcs(s, t);
if (lcs.Length < 10) break;
s = s.Replace(lcs, string.Format("[{0}]", lcs.Length));
}
return s;
}
Upvotes: 3
Reputation: 570
You need to be very careful between "Longest common substring and "longest common subsequence"
For Substring: http://en.wikipedia.org/wiki/Longest_common_substring_problem
For SubSequence: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
I would suggest you to also see few videos on youtube on these two topics http://www.youtube.com/results?search_query=longest+common+substring&oq=longest+common+substring&gs_l=youtube.3..0.3834.10362.0.10546.28.17.2.9.9.2.225.1425.11j3j3.17.0...0.0...1ac.lSrzx8rr1kQ
you can find c# implementation of longest common subsequence here:
http://www.alexandre-gomes.com/?p=177
http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_subsequence
Upvotes: 1
Reputation: 929
I have a similar issue, but for word occurrences! so, I hope this can help. I used SortedDictionary
and a binary search tree
/* Application counts the number of occurrences of each word in a string
and stores them in a generic sorted dictionary. */
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
public class SortedDictionaryTest
{
public static void Main( string[] args )
{
// create sorted dictionary
SortedDictionary< string, int > dictionary = CollectWords();
// display sorted dictionary content
DisplayDictionary( dictionary );
}
// create sorted dictionary
private static SortedDictionary< string, int > CollectWords()
{
// create a new sorted dictionary
SortedDictionary< string, int > dictionary =
new SortedDictionary< string, int >();
Console.WriteLine( "Enter a string: " ); // prompt for user input
string input = Console.ReadLine();
// split input text into tokens
string[] words = Regex.Split( input, @"\s+" );
// processing input words
foreach ( var word in words )
{
string wordKey = word.ToLower(); // get word in lowercase
// if the dictionary contains the word
if ( dictionary.ContainsKey( wordKey ) )
{
++dictionary[ wordKey ];
}
else
// add new word with a count of 1 to the dictionary
dictionary.Add( wordKey, 1 );
}
return dictionary;
}
// display dictionary content
private static void DisplayDictionary< K, V >(
SortedDictionary< K, V > dictionary )
{
Console.WriteLine( "\nSorted dictionary contains:\n{0,-12}{1,-12}",
"Key:", "Value:" );
/* generate output for each key in the sorted dictionary
by iterating through the Keys property with a foreach statement*/
foreach ( K key in dictionary.Keys )
Console.WriteLine( "{0,- 12}{1,-12}", key, dictionary[ key ] );
Console.WriteLine( "\nsize: {0}", dictionary.Count );
}
}
Upvotes: 0