Reputation: 33
Say I have a string
String str = "This problem sucks and is hard"
and I wanted to get the words before and after "problem", so "This" and "sucks". Is regex the best way to accomplish this (keeping in mind that I'm a beginner with regex), or does Java have some kind of library (i.e. StringUtils) that can accomplish this for me?
Upvotes: 1
Views: 4126
Reputation: 444
A bit verbose but this gets the job done accurately and quickly:
import java.io.*;
import java.util.*;
public class HelloWorld{
public static void main(String []args){
String EntireString="Hello World this is a test";
String SearchWord="World";
System.out.println(getPreviousWordFromString(EntireString,SearchWord));
}
public static String getPreviousWordFromString(String EntireString, String SearchWord) {
List<Integer> IndicesOfWords = new ArrayList();
boolean isWord = false;
int indexOfSearchWord=-1;
if(EntireString.indexOf(SearchWord)!=-1) {
indexOfSearchWord = EntireString.indexOf(SearchWord)-1;
} else {
System.out.println("ERROR: SearchWord passed (2nd arg) does not exist in string EntireString. EntireString: "+EntireString+" SearchWord: "+SearchWord);
return "";
}
if(EntireString.indexOf(SearchWord)==0) {
System.out.println("ERROR: The search word passed is the first word in the search string, so there are no words before it.");
return "";
}
for (int i = 0; i < EntireString.length(); i++) {
if (Character.isLetter(EntireString.charAt(i)) && i != indexOfSearchWord) {
isWord = true;
} else if (!Character.isLetter(EntireString.charAt(i)) && isWord) {
IndicesOfWords.add(i);
isWord = false;
} else if (Character.isLetter(EntireString.charAt(i)) && i == indexOfSearchWord) {
IndicesOfWords.add(i);
}
}
if(IndicesOfWords.size()>0) {
boolean isFirstWordAWord=true;
for (int i = 0; i < IndicesOfWords.get(0); i++) {
if(!Character.isLetter(EntireString.charAt(i))) {
isFirstWordAWord=false;
}
}
if(isFirstWordAWord==true) {
String firstWord = EntireString.substring(0,IndicesOfWords.get(0));
IndicesOfWords.add(0,0);
}
} else {
return "";
}
String ResultingWord = "";
for (int i = IndicesOfWords.size()-1; i >= 0; i--) {
if (EntireString.substring(IndicesOfWords.get(i)).contains(SearchWord)) {
if (i > 0) {
ResultingWord=EntireString.substring(IndicesOfWords.get(i-1),IndicesOfWords.get(i));
break;
}
if (i==0) {
ResultingWord=EntireString.substring(IndicesOfWords.get(0),IndicesOfWords.get(1));
}
}
}
return ResultingWord;
}
Upvotes: 0
Reputation: 159096
To find the words before and after a given word, you can use this regex:
(\w+)\W+problem\W+(\w+)
The capture groups are the words you're looking for.
In Java, that would be:
Pattern p = Pattern.compile("(\\w+)\\W+problem\\W+(\\w+)");
Matcher m = p.matcher("This problem sucks and is hard");
if (m.find())
System.out.printf("'%s', '%s'", m.group(1), m.group(2));
Output
'This', 'sucks'
If you want full Unicode support, add flag UNICODE_CHARACTER_CLASS
, or inline as (?U)
:
Pattern p = Pattern.compile("(?U)(\\w+)\\W+problema\\W+(\\w+)");
Matcher m = p.matcher("Questo problema è schifoso e dura");
if (m.find())
System.out.printf("'%s', '%s'", m.group(1), m.group(2));
Output
'Questo', 'è'
For finding multiple matches, use a while
loop:
Pattern p = Pattern.compile("(?U)(\\w+)\\W+problems\\W+(\\w+)");
Matcher m = p.matcher("Big problems or small problems, they are all just problems, man!");
while (m.find())
System.out.printf("'%s', '%s'%n", m.group(1), m.group(2));
Output
'Big', 'or'
'small', 'they'
'just', 'man'
Note: The use of \W+
allows symbols to occur between words, e.g. "No(!) problem here"
will still find "No"
and "here"
.
Also note that a number is considered a word: "I found 1 problem here"
returns "1"
and "here"
.
Upvotes: 2
Reputation: 3075
There is a StringUtils library by apache which does have the methods to substring before and after the string. Additionally there is java's own substring which you can play with to get what you need.
Apache StringUtils library API: https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html
The methods that you might need - substringBefore() and substringBefore().
Check this out if you want to explore java's own api's Java: Getting a substring from a string starting after a particular character
Upvotes: 0