Reputation: 26508
I have a requirement.
I have a text which can contain any characters.
a) I have to retain only Alphanumeric characters b) If the word "The" is found with a space prefixed or suffixed with the word, that needs to be removed.
e.g.
CASE 1:
Input: The Company Pvt Ltd.
Output: Company Pvt Ltd
But
Input: TheCompany Pvt Ltd.
Output: TheCompany Pvt Ltd
because there is no space between The & Company words.
CASE 2:
Similarly, Input: Company Pvt Ltd. The
Output: Company Pvt Ltd
But Input: Company Pvt Ltd.The
Output: Company Pvt Ltd
Case 3:
Input: Company@234 Pvt; Ltd.
Output: Company234 Pvt Ltd
No , or . or any other special characters.
I am basically setting the data to some variable like
_company.ShortName = _company.CompanyName.ToUpper();
So at the time of saving I cannot do anything. Only when I am getting the data from the database, then I need to apply this filter. The data is coming in _company.CompanyName
and I have to apply the filter on that.
So far I have done
public string ReplaceCharacters(string words)
{
words = words.Replace(",", " ");
words = words.Replace(";", " ");
words = words.Replace(".", " ");
words = words.Replace("THE ", " ");
words = words.Replace(" THE", " ");
return words;
}
private void button1_Click(object sender, EventArgs e)
{
MessageBox.Show(ReplaceCharacters(textBox1.Text.ToUpper()));
}
Thanks in advance. I am using C#
Upvotes: 2
Views: 736
Reputation: 33153
Here is a basic regex that matches your supplied cases. With the caveat that as Kobi says, your supplied cases are inconsistent, so I've taken the periods out of the first four tests. If you need both, please add a comment.
This handles all the cases you require, but the rapid proliferation of edge cases makes me think that maybe you should reconsider the initial problem?
[TestMethod]
public void RegexTest()
{
Assert.AreEqual("Company Pvt Ltd", RegexMethod("The Company Pvt Ltd"));
Assert.AreEqual("TheCompany Pvt Ltd", RegexMethod("TheCompany Pvt Ltd"));
Assert.AreEqual("Company Pvt Ltd", RegexMethod("Company Pvt Ltd. The"));
Assert.AreEqual("Company Pvt LtdThe", RegexMethod("Company Pvt Ltd.The"));
Assert.AreEqual("Company234 Pvt Ltd", RegexMethod("Company@234 Pvt; Ltd."));
// Two new tests for new requirements
Assert.AreEqual("CompanyThe Ltd", RegexMethod("CompanyThe Ltd."));
Assert.AreEqual("theasdasdatheapple", RegexMethod("the theasdasdathe the the the ....apple,,,, the"));
// And the case where you have THETHE at the start
Assert.AreEqual("CCC", RegexMethod("THETHE CCC"));
}
public string RegexMethod(string input)
{
// Old method before new requirement
//return Regex.Replace(input, @"The | The|[^A-Z0-9\s]", string.Empty, RegexOptions.IgnoreCase);
// New method that anchors the first the
//return Regex.Replace(input, @"^The | The|[^A-Z0-9\s]", string.Empty, RegexOptions.IgnoreCase);
// And a third method that does look behind and ahead for the last test
return Regex.Replace(input, @"^(The)+\s|\s(?<![A-Z0-9])[\s]*The[\s]*(?![A-Z0-9])| The$|[^A-Z0-9\s]", string.Empty, RegexOptions.IgnoreCase);
}
I've also added a test method to my example that exercises the RegexMethod that contains the regular expression. To use this in your code you just need the second method.
Upvotes: 10
Reputation: 138037
string company = "Company; PvtThe Ltd.The . The the.the";
company = Regex.Replace(company, @"\bthe\b", "", RegexOptions.IgnoreCase);
company = Regex.Replace(company, @"[^\w ]", "");
company = Regex.Replace(company, @"\s+", " ");
company = company.Trim();
// company == "Company PvtThe Ltd"
These are the steps. 1 and 2 can be combined, but this is more clear.
Upvotes: 2