Reputation: 51
I have list and text file and I want:
Code:
string[] Names = new string[] { "SNOW","Jhon Snow","ADEMS","RONALDO",
"AABY", "AADLAND", "ANGE", "GEEN", "KHA", "AN", "ANG", "EE", "GEE", "HA", "HAN", "KHAN",
"LA", "LAN", "LAND", "NG", "SA", "SAN", "SANG", "LAN","HAN", "LAN", "SANG", "SANG",
"Sangeen Khan"};
string Text = "I am Sangeen Khan and i am friend AABY. Jhon is friend of AABY.
AADLAND is good boy and he never speak lie. AABY is also good. SANGEEN KHAN is my name.";
List<string> matchedWords = Names.Where(Text.Contains).ToList();
matchedWords.ForEach(w => Text = Regex.Replace(Text, "\\b" + w + "\\b",
"Names", RegexOptions.IgnoreCase));
int numMatchedWords = matchedWords.Count;
Console.WriteLine($"Matched Words: {string.Join(",", matchedWords.ToArray())}");
Console.WriteLine($"Count: {numMatchedWords}");
Console.WriteLine($"Replaced Text: {Text}");
Output:
Matched Words: AABY, AADLAND, ANGE, GEEN, KHA, AN, ANG, EE, GEE, HA, HAN, KHAN, LA, LAN, LAND, NG, SA, SAN, SANG, LAN, HAN, LAN, SANG, SANG, Sangeen Khan
Replaced Text:I am Sangeen Names and i am friend Names. Jhon is friend of Names. Names is good boy and he never speak lie. Names is also good. SANGEEN Names is my name.
Count: 25
Problems: the code find the "Matched Words" and Number of Replacement (Count) incorrect. However, the replacement is corrected after reading String compare C# - whole word match
My desired output would be:
Matched Words: Sangeen Khan, AABY, KHAN, AADLAND.
Replaced Text: I am Names and i am friend Names. jhon is friend of Names. Names is good boy and he never speak lie. Names is also good. Names KHAN is my name.
Count: 7
Upvotes: 1
Views: 275
Reputation: 3784
The problem you face is replacement step by step. Let me explain. Let say you have this values:
string[] Names = { "Khan", "se" };
string Text = "Senator Khane";
If you run your code with these inputs will get:
"Senator NameNames"
Let analize the problem step by step. First let talk about case sensitivity. C# is, by default, case sensitive, this means that "Se"
is different from "se"
. This is why the word "Senator" wasn't replaced in any point.
The other problem is "NameNames" part. Let's decompose the execution plan:
First
Text = Text.Replace("Khan");
Which set Text
to value: "Senator Namese"
. The next forEach step was:
Text = Text.Replace("se");
So you see that the 's'
of Names plus 'e'
from Khane formed a actual valid pattern point, that in this case, will be replaced, forming the unwanted "NameNames"
.
Now that we understand the problem with your code lets us fix it.
.Net Framework already has a class that do this kind of replacement for us. Is called:
System.Text.RegularExpressions.Regex
To use it will need to create a regex pattern before. I'll not enter deeply into regex patterns constructions, so google up if you needed, is a super common talked subject in many foruns.
var names = new string[] { "SNOW","Jhon Snow","ADEMS","RONALDO",
"AABY", "AADLAND", "ANGE", "GEEN", "KHA", "AN", "ANG", "EE", "GEE", "HA", "HAN",
"KHAN", "LA", "LAN", "LAND", "NG", "SA", "SAN", "SANG", "LAN",
"HAN", "LAN", "SANG", "SANG", "Sangeen Khan" };
var text = "I am Sangeen Khan and i am friend AABY. Jhon is friend of AABY. " +
"AADLAND is good boy and he never speak lie. " +
"AABY is also good. SANGEEN KHAN is my name.";
var letter = new Regex(@"(?<letter>\W)");
var pattern = string.Join("|", names
.Select(n => $@"((?<=(^|\W)){letter.Replace(n, "[${letter}]")}(?=($|\W)))"));
var regex = new Regex(pattern);
var matchedWords = regex
.Matches(text)
.Cast<Match>()
.Select(m => m.Value)
//.Distinct()
.ToList();
text = regex.Replace(text, "Names");
Console.WriteLine($"Matched Words: {string.Join(", ", matchedWords.Distinct())}");
Console.WriteLine($"Count: {matchedWords.Count}");
Console.WriteLine($"Replaced Text: {text}");
I wrote this code without any VS or VS Code or Linqpad so if has some problem please let me know. (Later tonight I will check it myself.).
Upvotes: 1
Reputation: 173
This would only work on "whole" words:
string[] Names = new string[] { "Sangeen Khan", "AABY","AADLAND","LAND","LAND","SANG",
"jh", "han", "ngee","SNOW","Jhon Snow","ADEMS","RONALDO"};
string Text = "I am Sangeen Khan and I am friend of AABY. Jhon is also friend of AABY. AADLAND is good boy and he never speak lie.AABY is also good. SANGEEN KHAN is my name.";
string replace = "Names";
foreach(var name in Names)
{
string pattern = @"\b" + name + @"\b";
Text = Regex.Replace(Text, pattern, replace);
}
Console.WriteLine(Text);
Output:
I am Names and I am friend of Names. Jhon is also friend of Names. Names is good boy and he never speak lie.Names is also good. SANGEEN KHAN is my name.
Have in mind this is case-sensitive! In order to make it case insensitive, the pattern should be as follows:
string pattern = @"(?i)\b" + name + @"\b";
Output for case insensitive:
I am Names and I am friend of Names. Jhon is also friend of Names. Names is good boy and he never speak lie.Names is also good. Names is my name.
Upvotes: 0
Reputation: 897
It's a good idea to prioritize longer matches. Also, definitely sanitize/standardize your names.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace Rextester
{
public class Program
{
public static void Main(string[] args)
{
string[] Names = new string[] { "Sangeen Khan", "AABY","AADLAND","LAND","LAND","SANG",
"jh", "han", "ngee","SNOW","Jhon Snow","ADEMS","RONALDO"};
//Names = Standardize(Names);
string Text = @"I am Sangeen Khan and I am friend of AABY. Jhon is also friend of AABY.
AADLAND is good boy and he never speak lie. AABY is also good. SANGEEN KHAN is my name.";
//Text = Standardize(Text);
List<string> matchedWords = Names.Where(Text.Contains).OrderBy(x => x.Length).Reverse().ToList(); //Prioritize longer matches...
matchedWords.ForEach(w => Text = Text.Replace(w, "Names")); //By replacing longer matched names first
//listBox2.DataSource = matchedWords;
int numMatchedWords = matchedWords.Count;
Console.WriteLine("Matched Words: " + matchedWords.Aggregate((i, j) => i + " " + j));
Console.WriteLine("Count: " + numMatchedWords);
Console.WriteLine("Replaced Text: " + Text);
}
}
}
Upvotes: 0