Dorra
Dorra

Reputation: 135

Regex with java

I need to check for lines that have either one of the following patterns:

preposition word ||| other words or what ever
word preposition ||| other words or what ever

the preposition may be one of any word in a list like {de, à, pour, quand, ...} the word may be a preposition or not.

I tried many patterns,like the following

File file = new File("test.txt");   
Pattern pattern = Pattern.compile("(\\bde\\b|\\bà\\b) \\w.*",Pattern.CASE_INSENSITIVE);          
String fileContent = readFileAsString(file.getAbsolutePath());           
Matcher match = pattern.matcher(fileContent);
System.out.println( match.replaceAll("c"));

This pattern match a preposition followed by at least one word before the pipe. What I want is to match a preposition followed by just one word before the pipe. I tried the following pattern

Pattern pattern = Pattern.compile("(\\bde\\b|\\bla\\b)\\s\\w\\s\\|.*",Pattern.CASE_INSENSITIVE);

Unfortunately, this pattern doesn't work!

Upvotes: 3

Views: 281

Answers (2)

Steve P.
Steve P.

Reputation: 14699

For the sake of conciseness, I'm just going to use prep to stand in as a preposition that we could be dealing with:

Pattern pattern = Pattern.compile("(?:(?:\\bprep\\b \\w+)|(?:\\w+ \\bprep\\b)).*",
                                 Pattern.CASE_INSENSITIVE);    

(?:...) says to group but do not capture
\\bprep\\b ensures that prep is matched only if it is alone, ie it won't match is for preposition
\\w+ demands 1 or more [a-zA-Z_0-9]
.* at the end goes with both of the sets of parentheses

EDIT (in response to comment):
"^(?:(?:\\bprep\\b \\w+)|(?:\\w+ \\bprep\\b)).*" is working, you're just most likely running into the case where you have something like:

String myString = "hello prep someWord mindless nonsense";

This will match since this is captured by the second case: (?:\\w+ \\bprep\\b)).*.

If you try these, you'll see that the ^ is in fact working:

String myString = "egeg  prep rfb tgnbv";

This doesn't match the second case since there are 2 spaces after "egeg", so it can only match the first, but it does not due to the ^. Additionally:

String myString = "egeg hello prep rfb tgnbv";

We've established that a case like this won't match the first, and it also won't match the second, meaning that the ^ is in fact working.

Upvotes: 1

Dorra
Dorra

Reputation: 135

I thank you all for your answers. In fact, as @Pshemo said, I just have to add + after \w. I thought that \w means word. It works now with the following code:

File file = new File("test.txt");   
Pattern pattern = Pattern.compile("(\\bde\\b|\\bla\\b)\\s\\w+\\s\\|.*|\\w+\\s(\\bde\\b|\\bla\\b)\\s\\|.*",Pattern.CASE_INSENSITIVE)
String fileContent = readFileAsString(file.getAbsolutePath());           
Matcher match = pattern.matcher(fileContent);
System.out.println( match.replaceAll(""));

As input for example, I have the follwong lines :

the world |||something here|||other things here

world about |||something here|||other things here

another example ||| something here|||other things here

the final and the last example|||something here|||other things here

Then, supposing that the list of preposition are {the, about}, the out put will be:

another example ||| something here|||other things here

the final and the last example|||something here|||other things here

As you see, I just want to match the two first lines and to remove them.

Upvotes: 0

Related Questions