Batakj
Batakj

Reputation: 12743

How to search for a string and similar word in a text?

I have to lookup for a word "age" and similar word in a text file.

I have following sentence :

String.contains always return true in each case. My requirement is to pass the first five sentence and it return false in last case.

I will solve this problem by writing some code which contains a bunch of string " age ", " age." , "ages", "aged", " age," etc..

Is there any better way to solve this problem.

Upvotes: 3

Views: 1701

Answers (3)

Saitama
Saitama

Reputation: 133

What you need is called a regular expression (or regex)

Here's a perfectly detailed definition of regular expressions and use in Java, which can be done with matches(String Regex) method of String class.

For your example, it could (normally) be : myString.matches(".*age? .*").

Pay attention in escaping special characters in Java. You can try your regexs here. I didn't do it in the example above, but you can try :)

In detail :

  • .* : the sentence can begin with everything
  • age : the sentence must contain 'age'
  • ? : age can be followed by zero or one character
  • : then a space
  • .*: then everything again

Hope it helped.

Upvotes: 1

vefthym
vefthym

Reputation: 7462

A naive solution (expensive) would be the following:

  1. tokenize each line (e.g., split by " ", or even non-alphanumeric characters, which already removes punctuation).
  2. calculate the edit distance of each word to the word age
  3. if the current word has a small edit distance (e.g., bellow 2), return line

The edit distance of two string is the number of edits (additions, deletions and replacements) that are required to make one string equal to the other. You can find an implementation of edit distance in the simmetrics library, or maybe elsewhere, too.

Another option could be to stem the words at step 2 and use contains with the stemming of the word age (also expensive).

If you already know all the acceptable answers (or at least their pattern), go for Avinash Raj's answer.

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174696

If you use regex, you have to put all the possiblities.

string.matches("(?i).*\\bage[ds]?\\b.*");

Upvotes: 3

Related Questions