Magnus Engdal
Magnus Engdal

Reputation: 5634

Select previous and next word in a string

I'm looping through a lot of strings like this one in C# :

“Look, good against remotes is one thing, good against the living, that’s something else.”

In these strings, I have a single selected word, determined by an index from a previous function, like the second "good" in the case above.

“Look, good (<- not this one) against remotes is one thing, good (<- this one) against the living, that’s something else.”

I want to find the words surrounding my selected word. In the case above, thing and against.

“Look, good against remotes is one thing, good against the living, that’s something else.”

I have tried taking the string apart with .split() and different approaches with regular expressions, but I can't find a good way to achieve this. I have access to the word, good in the example above, and the index (41 above) where it's located in the string.

A huge bonus if it would ignore punctuation and commas, so that in the example above, my theoretical function would only return against since there is a comma between thing and good.

Is there a simple way to achieve this? Any help appreciated.

Upvotes: 5

Views: 5589

Answers (7)

P.Brian.Mackey
P.Brian.Mackey

Reputation: 44285

Without the regex this can be done recursively with Array.IndexOf.

public class BeforeAndAfterWordFinder
{
    public string Input { get; private set; }
    private string[] words;

    public BeforeAndAfterWordFinder(string input)
    {
        Input = input;
        words = Input.Split(new string[] { ", ", " " }, StringSplitOptions.None);
    }

    public void Run(int occurance, string word)
    {
        int index = 0;
        OccuranceAfterWord(occurance, word, ref index);
        Print(index);            
    }

    private void OccuranceAfterWord(int occurance, string word, ref int lastIndex, int thisOccurance = 0)
    {
        lastIndex = lastIndex > 0 ? Array.IndexOf(words, word, lastIndex + 1) : Array.IndexOf(words, word);

        if (lastIndex != -1)
        {
            thisOccurance++; 
            if (thisOccurance < occurance)
            {
                OccuranceAfterWord(occurance, word, ref lastIndex, thisOccurance);
            }                
        }            
    }

    private void Print(int index)
    {            
        Console.WriteLine("{0} : {1}", words[index - 1], words[index + 1]);//check for index out of range
    }
}

Usage:

  string input = "Look, good against remotes is one thing, good against the living, that’s something else.";
  var F = new BeforeAndAfterWordFinder(input);
  F.Run(2, "good");  

Upvotes: 0

Gavin
Gavin

Reputation: 516

Here is a linqpad program written in vb

    Sub Main
    dim input as string = "Look, good against remotes is one thing, good against the living, that’s something else."

    dim words as new list(of string)(input.split(" "c))

    dim index = getIndex(words)

    dim retVal = GetSurrounding(words, index, "good", 2)

    retVal.dump()
End Sub

function getIndex(words as list(of string)) as dictionary(of string, list(of integer))

    for i as integer = 0 to words.count- 1
            words(i) = getWord(words(i))
    next

    'words.dump()

    dim index as new dictionary(of string, List(of integer))(StringComparer.InvariantCultureIgnoreCase)
    for j as integer = 0 to words.count- 1
            dim word = words(j)
            if index.containsKey(word) then
                    index(word).add(j)
            else  
                    index.add(word, new list(of integer)({j}))
            end if
    next

    'index.dump()
    return index
end function

function getWord(candidate) as string
    dim pattern as string = "^[\w'’]+"
    dim match = Regex.Match(candidate, pattern)
    if match.success then
            return match.toString()
    else
            return candidate
    end if
end function 

function GetSurrounding(words, index, word, position) as tuple(of string, string)        

    if not index.containsKey(word) then
            return nothing
    end if

    dim indexEntry = index(word)
    if position > indexEntry.count
            'not enough appearences of word
            return nothing
    else
            dim left = ""
            dim right = ""
            dim positionInWordList = indexEntry(position -1)
            if PositionInWordList >0
                    left = words(PositionInWordList-1)
            end if
            if PositionInWordList < words.count -1
                    right = words(PositionInWordList +1)
            end if

            return new tuple(of string, string)(left, right)
    end if
end function

Upvotes: 0

Andrew Clark
Andrew Clark

Reputation: 208615

Including the "huge bonus":

string text = "Look, good against remotes is one thing, good against the living, that’s something else.";
string word = "good";
int index = 41;

string before = Regex.Match(text.Substring(0, index), @"(\w*)\s*$").Groups[1].Value;
string after = Regex.Match(text.Substring(index + word.Length), @"^\s*(\w*)").Groups[1].Value;

In this case before will be an empty string because of the comma, and after will be "against".

Explanation: When getting before, the first step is to grab just the first part of the string up until just before the target word, text.Substring(0, index) does this. Then we use the regular expression (\w*)\s*$ to match and capture a word (\w*) followed by any amount of whitespace \s* at the end of the string ($). The contents of the first capture group is the word we want, if we could not match a word the regex will still match but it will match an empty string or only whitespace, and the first capture group will contain an empty string.

The logic for getting after is pretty much the same, except that text.Substring(index + word.Length) is used to get the rest of the string after the target word. The regex ^\s*(\w*) is similar except that it is anchored to the beginning of the string with ^ and the \s* comes before the \w* since we need to strip off whitespace on the front end of the word.

Upvotes: 5

Nikolai Samteladze
Nikolai Samteladze

Reputation: 7797

string phrase = "Look, good against remotes is one thing, good against the living, that’s something else.";
int selectedPosition = 41;
char[] ignoredSpecialChars = new char[2] { ',', '.' };

string afterWord = phrase.Substring(selectedPosition)
                         .Split(' ')[1]
                         .Trim(ignoredSpecialChars);
string beforeWord = phrase.Substring(0, selectedPosition)
                          .Split(' ')
                          .Last()
                          .Trim(ignoredSpecialChars);

You can change ignoredSpecialChars array to get rid of the special characters you don't need.

UPDATE:

This will return null if there are any special characters between your word and words that surround it.

string phrase = "Look, good against remotes is one thing, good against the living, that’s something else.";
int selectedPosition = 41;
char[] ignoredSpecialChars = new char[2] { ',', '.' };

string afterWord = phrase.Substring(selectedPosition)
                         .Split(' ')[1];
afterWord = Char.IsLetterOrDigit(afterWord.First()) ?
            afterWord.TrimEnd(ignoredSpecialChars) : 
            null;

string beforeWord = phrase.Substring(0, selectedPosition)
                          .Split(' ')
                          .Last();
beforeWord = Char.IsLetterOrDigit(beforeWord.Last()) ?
             beforeWord.TrimStart(ignoredSpecialChars) : 
             null;

Upvotes: 3

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236288

You can use regular expression [^’a-zA-Z]+ to get words from your string:

words = Regex.Split(text, @"[^’a-zA-Z0-9]+");

Implementing navigation is up to you. Store index of selected word and use it to get next one or previous:

int index = Array.IndexOf(words, "living");
if (index < words.Count() - 1)
    next = words[index + 1]; // that's

if (index > 0)
    previous = words[index - 1]; // the

Upvotes: 0

Pinguin895
Pinguin895

Reputation: 1011

i haven't tested it yet, but it should work. You can just look at the Substring before and after the word and then search for the first or the last " ". Then you know where the words start and end.

string word = "good";
int index = 41

string before = word.Substring(0,index-1).Trim();   //-1 because you want to ignore the " " right in front of the word
string after = word.Substring(index+word.length+1).Trim();   //+1 because of the " " after the word

int indexBefore = before.LastIndexOf(" ");
int indexAfter = after.IndexOf(" ");

string wordBefore = before.Substring(indexBefore, index-1);
string wordAfter = after.Substring(index+word.length+1, indexAfter);

EDIT

and if you want to ignore punctuation and commas, just remove them from your string

Upvotes: 0

Chrille Krycka
Chrille Krycka

Reputation: 32

create a string where you remove punctuation and commas (use Remove). from that string, search for Substring "thing good against". and so on, if needed.

Upvotes: -2

Related Questions