DMur
DMur

Reputation: 659

Check string for two consecutive letters followed by a lower case character

I have a collection of strings. I added a "." at the end of each in a foreach loop and concatenated them into a single string.

However, not all strings after concatenation need a "."

So now I have a long string where I want to remove unnecessary "." So I need to check for example "awefeaefe. efewgwe waggrgrgae. Weefwafewf ewfefw. Ewfewfgewgr. ewgfewg"

Where ". " is followed by a lower case character, for example "e", I want to delete the "."

Where ". " is followed by an upper case character, for example "E", do nothing.

I have tried creating a foreach (char c in parsedPara) loop to get every 3 characters and check each one, but it's missing 2 out of every 3 combinations of letters as it's running on characters consecutively and I also don't know how to get the index for the correct "." character in my original string from the loop if I find a combination anyway.

I have also tried creating a badString = ". " + char.toLower() but I don't have a character to put into .toLower. I know the 3rd character is going to be lowercase, but I don't know what character it will be.

Example as requested:

    public class AnalyzeImage
    {
        public async Task analyzeImage(string imageUri)
        {
            string endpoint = Environment.GetEnvironmentVariable("VISION_ENDPOINT");
            string key = Environment.GetEnvironmentVariable("VISION_KEY");

            ImageAnalysisClient client = new ImageAnalysisClient(new Uri(endpoint), new AzureKeyCredential(key));

            ImageAnalysisResult result = client.Analyze(new Uri(imageUri), VisualFeatures.Read, new ImageAnalysisOptions { GenderNeutralCaption = true });

            string cleanLine;
            string parsedPara = string.Empty;
            string miniString = string.Empty;

            int i = 0;
            foreach (DetectedTextBlock block in result.Read.Blocks)
            {
                foreach (DetectedTextLine line in block.Lines)
                {
                    cleanLine = line.Text.Replace("'", "");

                    if (!cleanLine.EndsWith(".") || !cleanLine.EndsWith(",") || !cleanLine.EndsWith("!") || !cleanLine.EndsWith("?") || !cleanLine.EndsWith("-"))
                    {
                        cleanLine += ". "; //Add period character to the end of strings missing a closing character.
                    }
                    if (cleanLine.EndsWith(".."))
                    {
                        cleanLine.Remove(cleanLine.Length - 1);
                    }

                    parsedPara += cleanLine; //Concatenate strings into a single string.

                    foreach (char c in parsedPara) //This is where I start trying to check every combination of 3 characters to identify any misplaced period characters mid sentence.
                    {
                        miniString = miniString + c;

                        if (miniString.Length > 2)
                        {
                            miniString = string.Empty;
                        } else if (miniString.Length == 3) 
                        {
                            char firstChar = miniString[0];
                            char secondChar = miniString[1];
                            char thirdChar = miniString[2];

                            if(firstChar.ToString() == "." && secondChar.ToString() == " " && char.IsLower(thirdChar))
                            {
                                Debug.WriteLine("HIT!");
                            }

                        }
                        i++;
                    }
                }
            }
            Debug.WriteLine("parsedPara: " + parsedPara);
        }
    }
}

Upvotes: 0

Views: 61

Answers (1)

DuesserBaest
DuesserBaest

Reputation: 2829

Try using regex by matching:

\.(?=\s*[a-z])

and replacing with an empty string. See: regex101


Explanation

MATCH:

  • \.: Match a literal dot
  • (?= ... ): only if it is succeeded by
    • \s*: any amount of whitespace characters (change to if you only ever have a single space)
    • [a-z]: and a lowercase letter.

Upvotes: 2

Related Questions