Reputation: 135
I need to check for lines that have either one of the following patterns:
preposition word ||| other words or what ever
word preposition ||| other words or what ever
the preposition may be one of any word in a list like {de, à, pour, quand, ...} the word may be a preposition or not.
I tried many patterns,like the following
File file = new File("test.txt");
Pattern pattern = Pattern.compile("(\\bde\\b|\\bà\\b) \\w.*",Pattern.CASE_INSENSITIVE);
String fileContent = readFileAsString(file.getAbsolutePath());
Matcher match = pattern.matcher(fileContent);
System.out.println( match.replaceAll("c"));
This pattern match a preposition followed by at least one word before the pipe. What I want is to match a preposition followed by just one word before the pipe. I tried the following pattern
Pattern pattern = Pattern.compile("(\\bde\\b|\\bla\\b)\\s\\w\\s\\|.*",Pattern.CASE_INSENSITIVE);
Unfortunately, this pattern doesn't work!
Upvotes: 3
Views: 281
Reputation: 14699
For the sake of conciseness, I'm just going to use prep
to stand in as a preposition that we could be dealing with:
Pattern pattern = Pattern.compile("(?:(?:\\bprep\\b \\w+)|(?:\\w+ \\bprep\\b)).*",
Pattern.CASE_INSENSITIVE);
(?:...)
says to group but do not capture
\\bprep\\b
ensures that prep
is matched only if it is alone, ie it won't match is for preposition
\\w+
demands 1 or more [a-zA-Z_0-9]
.*
at the end goes with both of the sets of parentheses
EDIT (in response to comment):
"^(?:(?:\\bprep\\b \\w+)|(?:\\w+ \\bprep\\b)).*"
is working, you're just most likely running into the case where you have something like:
String myString = "hello prep someWord mindless nonsense";
This will match since this is captured by the second case: (?:\\w+ \\bprep\\b)).*
.
If you try these, you'll see that the ^
is in fact working:
String myString = "egeg prep rfb tgnbv";
This doesn't match the second case since there are 2 spaces after "egeg"
, so it can only match the first, but it does not due to the ^
. Additionally:
String myString = "egeg hello prep rfb tgnbv";
We've established that a case like this won't match the first, and it also won't match the second, meaning that the ^
is in fact working.
Upvotes: 1
Reputation: 135
I thank you all for your answers. In fact, as @Pshemo said, I just have to add + after \w. I thought that \w means word. It works now with the following code:
File file = new File("test.txt");
Pattern pattern = Pattern.compile("(\\bde\\b|\\bla\\b)\\s\\w+\\s\\|.*|\\w+\\s(\\bde\\b|\\bla\\b)\\s\\|.*",Pattern.CASE_INSENSITIVE)
String fileContent = readFileAsString(file.getAbsolutePath());
Matcher match = pattern.matcher(fileContent);
System.out.println( match.replaceAll(""));
As input for example, I have the follwong lines :
the world |||something here|||other things here
world about |||something here|||other things here
another example ||| something here|||other things here
the final and the last example|||something here|||other things here
Then, supposing that the list of preposition are {the, about}, the out put will be:
another example ||| something here|||other things here
the final and the last example|||something here|||other things here
As you see, I just want to match the two first lines and to remove them.
Upvotes: 0