Reputation: 5634
I'm looping through a lot of strings like this one in C# :
“Look, good against remotes is one thing, good against the living, that’s something else.”
In these strings, I have a single selected word, determined by an index from a previous function, like the second "good" in the case above.
“Look, good (<- not this one) against remotes is one thing, good (<- this one) against the living, that’s something else.”
I want to find the words surrounding my selected word. In the case above, thing and against.
“Look, good against remotes is one thing, good against the living, that’s something else.”
I have tried taking the string apart with .split()
and different approaches with regular expressions, but I can't find a good way to achieve this. I have access to the word, good in the example above, and the index (41 above) where it's located in the string.
A huge bonus if it would ignore punctuation and commas, so that in the example above, my theoretical function would only return against since there is a comma between thing and good.
Is there a simple way to achieve this? Any help appreciated.
Upvotes: 5
Views: 5589
Reputation: 44285
Without the regex this can be done recursively with Array.IndexOf
.
public class BeforeAndAfterWordFinder
{
public string Input { get; private set; }
private string[] words;
public BeforeAndAfterWordFinder(string input)
{
Input = input;
words = Input.Split(new string[] { ", ", " " }, StringSplitOptions.None);
}
public void Run(int occurance, string word)
{
int index = 0;
OccuranceAfterWord(occurance, word, ref index);
Print(index);
}
private void OccuranceAfterWord(int occurance, string word, ref int lastIndex, int thisOccurance = 0)
{
lastIndex = lastIndex > 0 ? Array.IndexOf(words, word, lastIndex + 1) : Array.IndexOf(words, word);
if (lastIndex != -1)
{
thisOccurance++;
if (thisOccurance < occurance)
{
OccuranceAfterWord(occurance, word, ref lastIndex, thisOccurance);
}
}
}
private void Print(int index)
{
Console.WriteLine("{0} : {1}", words[index - 1], words[index + 1]);//check for index out of range
}
}
Usage:
string input = "Look, good against remotes is one thing, good against the living, that’s something else.";
var F = new BeforeAndAfterWordFinder(input);
F.Run(2, "good");
Upvotes: 0
Reputation: 516
Here is a linqpad program written in vb
Sub Main
dim input as string = "Look, good against remotes is one thing, good against the living, that’s something else."
dim words as new list(of string)(input.split(" "c))
dim index = getIndex(words)
dim retVal = GetSurrounding(words, index, "good", 2)
retVal.dump()
End Sub
function getIndex(words as list(of string)) as dictionary(of string, list(of integer))
for i as integer = 0 to words.count- 1
words(i) = getWord(words(i))
next
'words.dump()
dim index as new dictionary(of string, List(of integer))(StringComparer.InvariantCultureIgnoreCase)
for j as integer = 0 to words.count- 1
dim word = words(j)
if index.containsKey(word) then
index(word).add(j)
else
index.add(word, new list(of integer)({j}))
end if
next
'index.dump()
return index
end function
function getWord(candidate) as string
dim pattern as string = "^[\w'’]+"
dim match = Regex.Match(candidate, pattern)
if match.success then
return match.toString()
else
return candidate
end if
end function
function GetSurrounding(words, index, word, position) as tuple(of string, string)
if not index.containsKey(word) then
return nothing
end if
dim indexEntry = index(word)
if position > indexEntry.count
'not enough appearences of word
return nothing
else
dim left = ""
dim right = ""
dim positionInWordList = indexEntry(position -1)
if PositionInWordList >0
left = words(PositionInWordList-1)
end if
if PositionInWordList < words.count -1
right = words(PositionInWordList +1)
end if
return new tuple(of string, string)(left, right)
end if
end function
Upvotes: 0
Reputation: 208615
Including the "huge bonus":
string text = "Look, good against remotes is one thing, good against the living, that’s something else.";
string word = "good";
int index = 41;
string before = Regex.Match(text.Substring(0, index), @"(\w*)\s*$").Groups[1].Value;
string after = Regex.Match(text.Substring(index + word.Length), @"^\s*(\w*)").Groups[1].Value;
In this case before
will be an empty string because of the comma, and after
will be "against".
Explanation: When getting before
, the first step is to grab just the first part of the string up until just before the target word, text.Substring(0, index)
does this. Then we use the regular expression (\w*)\s*$
to match and capture a word (\w*
) followed by any amount of whitespace \s*
at the end of the string ($
). The contents of the first capture group is the word we want, if we could not match a word the regex will still match but it will match an empty string or only whitespace, and the first capture group will contain an empty string.
The logic for getting after
is pretty much the same, except that text.Substring(index + word.Length)
is used to get the rest of the string after the target word. The regex ^\s*(\w*)
is similar except that it is anchored to the beginning of the string with ^
and the \s*
comes before the \w*
since we need to strip off whitespace on the front end of the word.
Upvotes: 5
Reputation: 7797
string phrase = "Look, good against remotes is one thing, good against the living, that’s something else.";
int selectedPosition = 41;
char[] ignoredSpecialChars = new char[2] { ',', '.' };
string afterWord = phrase.Substring(selectedPosition)
.Split(' ')[1]
.Trim(ignoredSpecialChars);
string beforeWord = phrase.Substring(0, selectedPosition)
.Split(' ')
.Last()
.Trim(ignoredSpecialChars);
You can change ignoredSpecialChars
array to get rid of the special characters you don't need.
UPDATE:
This will return null
if there are any special characters between your word and words that surround it.
string phrase = "Look, good against remotes is one thing, good against the living, that’s something else.";
int selectedPosition = 41;
char[] ignoredSpecialChars = new char[2] { ',', '.' };
string afterWord = phrase.Substring(selectedPosition)
.Split(' ')[1];
afterWord = Char.IsLetterOrDigit(afterWord.First()) ?
afterWord.TrimEnd(ignoredSpecialChars) :
null;
string beforeWord = phrase.Substring(0, selectedPosition)
.Split(' ')
.Last();
beforeWord = Char.IsLetterOrDigit(beforeWord.Last()) ?
beforeWord.TrimStart(ignoredSpecialChars) :
null;
Upvotes: 3
Reputation: 236288
You can use regular expression [^’a-zA-Z]+
to get words from your string:
words = Regex.Split(text, @"[^’a-zA-Z0-9]+");
Implementing navigation is up to you. Store index of selected word and use it to get next one or previous:
int index = Array.IndexOf(words, "living");
if (index < words.Count() - 1)
next = words[index + 1]; // that's
if (index > 0)
previous = words[index - 1]; // the
Upvotes: 0
Reputation: 1011
i haven't tested it yet, but it should work. You can just look at the Substring before and after the word and then search for the first or the last " ". Then you know where the words start and end.
string word = "good";
int index = 41
string before = word.Substring(0,index-1).Trim(); //-1 because you want to ignore the " " right in front of the word
string after = word.Substring(index+word.length+1).Trim(); //+1 because of the " " after the word
int indexBefore = before.LastIndexOf(" ");
int indexAfter = after.IndexOf(" ");
string wordBefore = before.Substring(indexBefore, index-1);
string wordAfter = after.Substring(index+word.length+1, indexAfter);
EDIT
and if you want to ignore punctuation and commas, just remove them from your string
Upvotes: 0
Reputation: 32
create a string where you remove punctuation and commas (use Remove). from that string, search for Substring "thing good against". and so on, if needed.
Upvotes: -2