zengr
zengr

Reputation: 38919

How can this Java code be improved to find sub-string in a string?

I was recently asked to submit a solution to a problem for a job.

Problem: Find a sub-string in a string.

Input: "Little star's deep dish pizza sure is fantastic."  
Search: "deep dish pizza"  
Output: "Little star's [[HIGHLIGHT]]deep dish pizza[[ENDHIGHLIGHT]] sure is fantastic."

Note that the highlighter doesn't have to have the exact same result on this example, since you are defining what a good snippet is and return the the most relevant snippet with the query terms highlighted.

The most important requirement was to write it as I would write a production code.

My solution was not accepted. How could I have improved it? I know, I could have used:

  1. Knuth–Morris–Pratt algorithm
  2. Regex (could I?)

My QUESTION:

  1. What do tech companies take into consideration when they review a code for a job. I submitted the code the same day, does that help in any way?

  2. In one of the comments, it pointed out, it looks like a school code than production code. How? Any suggestions?

My solution:
FindSubString.java

/**
 * FindSubString.java: Find sub-string in a given query
 * 
 * @author zengr
 * @version 1.0
 */

public class FindSubstring {
    private static final String startHighlight = "[[HIGHLIGHT]]";
    private static final String endHighlight = "[[ENDHIGHLIGHT]]";

    /**
     * Find sub-string in a given query
     * 
     * @param inputQuery: A string data type (input Query)
     * @param highlightDoc: A string data type (pattern to match)
     * @return inputQuery: A String data type.
     */
    public String findSubstringInQuery(String inputQuery, String highlightDoc) {
        try {

            highlightDoc = highlightDoc.trim();

            if (inputQuery.toLowerCase().indexOf(highlightDoc.toLowerCase()) >= 0) {
                // update query if exact doc exists
                inputQuery = updateString(inputQuery, highlightDoc);
            }

            else {
                // If exact doc is not in the query then break it up
                String[] docArray = highlightDoc.split(" ");

                for (int i = 0; i < docArray.length; i++) {
                    if (inputQuery.toLowerCase().indexOf(docArray[i].toLowerCase()) > 0) {
                        inputQuery = updateString(inputQuery, docArray[i]);
                    }
                }
            }
        } catch (NullPointerException ex) {
            // Ideally log this exception
            System.out.println("Null pointer exception caught: " + ex.toString());
        }

        return inputQuery;
    }

    /**
     * Update the query with the highlighted doc
     * 
     * @param inputQuery: A String data type (Query to update)
     * @param highlightDoc: A String data type (pattern around which to update)
     * @return inputQuery: A String data type.
     */
    private String updateString(String inputQuery, String highlightDoc) {
        int startIndex = 0;
        int endIndex = 0;

        String lowerCaseDoc = highlightDoc.toLowerCase();
        String lowerCaseQuery = inputQuery.toLowerCase();

        // get index of the words to highlight
        startIndex = lowerCaseQuery.indexOf(lowerCaseDoc);
        endIndex = lowerCaseDoc.length() + startIndex;

        // Get the highlighted doc
        String resultHighlightDoc = highlightString(highlightDoc);

        // Update the original query
        return inputQuery = inputQuery.substring(0, startIndex - 1) + resultHighlightDoc + inputQuery.substring(endIndex, inputQuery.length());
    }

    /**
     * Highlight the doc
     * 
     * @param inputString: A string data type (value to be highlighted)
     * @return highlightedString: A String data type.
     */
    private String highlightString(String inputString) {
        String highlightedString = null;

        highlightedString = " " + startHighlight + inputString + endHighlight;

        return highlightedString;
    }
}

TestClass.java

/**
 * TestClass.java: jUnit test class to test FindSubString.java
 * 
 * @author zengr
 * @version 1.0
 */

import junit.framework.Test;
import junit.framework.TestCase;
import junit.framework.TestSuite;

public class TestClass extends TestCase
{
    private FindSubstring simpleObj = null;
    private String originalQuery = "I like fish. Little star's deep dish pizza sure is fantastic. Dogs are funny.";

    public TestClass(String name) {
        super(name);
    }

    public void setUp() { 
        simpleObj = new FindSubstring();
    }

    public static Test suite(){

        TestSuite suite = new TestSuite();
        suite.addTest(new TestClass("findSubstringtNameCorrect1Test"));
        suite.addTest(new TestClass("findSubstringtNameCorrect2Test"));
        suite.addTest(new TestClass("findSubstringtNameCorrect3Test"));
        suite.addTest(new TestClass("findSubstringtNameIncorrect1Test"));
        suite.addTest(new TestClass("findSubstringtNameNullTest"));

        return suite;
    }

    public void findSubstringtNameCorrect1Test() throws Exception
    {
        String expectedOutput = "I like fish. Little star's deep [[HIGHLIGHT]]dish pizza[[ENDHIGHLIGHT]] sure is fantastic. Dogs are funny.";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery(originalQuery, "dish pizza"));
    }

    public void findSubstringtNameCorrect2Test() throws Exception 
    {
        String expectedOutput = "I like fish. Little star's [[HIGHLIGHT]]deep dish pizza[[ENDHIGHLIGHT]] sure is fantastic. Dogs are funny.";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery(originalQuery, "deep dish pizza"));
    }

    public void findSubstringtNameCorrect3Test() throws Exception 
    {
        String expectedOutput = "Hello [[HIGHLIGHT]]how[[ENDHIGHLIGHT]] are [[HIGHLIGHT]]you[[ENDHIGHLIGHT]]r?";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery("Hello how are your?", "how you"));
    }

    public void findSubstringtNameIncorrect1Test() throws Exception 
    {
        String expectedOutput = "I like fish. Little star's deep dish pizza sure is fantastic. Dogs are funny.";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery(originalQuery, "I love Ruby too"));
    }

    public void findSubstringtNameNullTest() throws Exception
    {
        String expectedOutput = "I like fish. Little star's deep dish pizza sure is fantastic. Dogs are funny.";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery(originalQuery, null));

    }
}

Upvotes: 10

Views: 1542

Answers (6)

nas
nas

Reputation: 977

It sounds like you missed the point of the problem. The original problem statement says:

Note that the highlighter doesn't have to have the exact same result on this example, since you are defining what a good snippet is and return the the most relevant snippet with the query terms highlighted.

It sounds like they wanted you to identify a good snippet to return, not to just highlight words in the original input. If the input was long, you'd want to return a smaller snippet of text with the highlighted words.

One possible approach would be:

  • Find the smallest snippet of text that contains all inputs (or the most possible inputs).
  • Expand the snippet to include the enclosing sentences if possible.
  • If the result is greater than a max length, say 100, prune the
    snippet, possibly by inserting ellipsis somewhere within.

Things like good sentence identification, word stemming, and spelling corrections might also be within scope, especially if you are allowed to use third party libraries.

Upvotes: 2

Qwerky
Qwerky

Reputation: 18455

A few comments;

  • You only highlight the first occurance of the search string.
  • You assume that lower case matching is fine. Unless this was specified as a requirement it might be better to provide two methods, one that respects case and one that ignores case.
  • I would probably check the given parameters and throw a NPE if either of them were null. This would be the first thing my method did. I would clearly document this behaviour in the javadoc.
  • Your method mame is bad; findSubstringInQuery's main task isn't to find, it is to highlight and the inQuery part is superflous. Just call the method highlight or maybe highlightIgnoreCase if you are going to have a highlight that respects case.
  • Your method parameter names are bad. I've looked at your method signature 10 times and still have to look at the method body to remind myself which arg is the search term and which is the text to search. Call them searchTerm and text.
  • Production code doesn't use the default package.
  • Production code doesn't use System.out.println().
  • Your javadoc need improving, it needs to tell the user everything they need to know about the code.
  • I would consider using static methods for a class with no class variables.
  • I would also consider allowing the user to specify their own start and end highlighting markers (I wouldn't use static methods if I did this).
  • I wouldn't trim() unless this was specified as a requirement. If I did, then obviously this behaviour would be documented in the javadoc.

I wouldn't worry about the algorithm used for searching, Knuth-Morris-Pratt looks good but they shouldn't expect you to know about it and implement it unless the job spec specifically asked for experience/expertise in string searching.

Upvotes: 8

Grodriguez
Grodriguez

Reputation: 22025

If this code was submitted to me for review, this is what I would think:

  • The code is overly verbose and complex where it does not need to be. The whole thing can be done in a small method in ten lines of code, including all sanity checks. This method should probably be static.
  • Your code is doing things that (I assume) were not asked for. You were being asked to search for a substring in a string. If it is not found, then that's fine -- no need to split the substring into words and search for each individual word.
  • Unless they asked you to remove leading and trailing whitespace, I would not have called trim()
  • I would not have included the calls to toLowerCase() unless explicitly asked for, although I would have added a comment stating that they could be added if needed. Anyway even if the search is meant to be case insensitive, there are too many redundant calls to toLowerCase() in your code.
  • You should not need to catch NullPointerException -- instead, you should ensure that this exception is never thrown.

Upvotes: 3

Arne Deutsch
Arne Deutsch

Reputation: 14789

As they seem to emphasize that you can define what a good output is ... perhaps how do you do the parsing is not what they want to know. Maybe they want that you realize that marking the text in the string with a marker is not a very good solution. If the result is to be used in the program for further processing something like this might be more appropriate.

class MarkedText {
    String highlightDoc;
    String inputQuery;
    List<Range> ranges;
}

class Range {
    int offset;
    int length;
}

public MarkedText findSubstringInQuery(String inputQuery, String highlightDoc) {
    [...]
}

Upvotes: 1

drekka
drekka

Reputation: 21903

I don't know if I'm missing something, but I'd write a simple block of code using indexOf() and other string methods. Depending on the defintion of the problem, I'd probably use a StringBuilder.insert() method to inject the highlighting. I'd not be looking do to tokenising and looping as you have done here. I'd justify it by saying that it's the simplest solution to the problem as specified. KISS is the best approach to open questions like this.

Although they have given you the inputs and outputs, did they specify what was to happen if the inputs changed and there was no match?

I also notice the catching of a null pointer. Good idea if you know where it's likely to occur and intend to do something about it. But as you are just logging, perhaps you should have done a more general catch. For example, can your code trigger a IndexOutOfBoundsException. So I'd be looking to catch Exception or Throwable.

The other thing I'd be asking is a definition of what they consider "Production code". At first glance it sounds simple enough, but my experience is that it can be interpreted in many different ways.

The real problem is that they will be expecting certain things, and you don't know them. So you code what works for you and hope that it matches what they expect.

Upvotes: 1

David Gelhar
David Gelhar

Reputation: 27900

Do you mean to be matching on partial words in the case where you break up the query? You also aren't accounting for the possibility that the search string contains the word "HIGHLIGHT".

For example, try this:

Input: "Little star's deep dish pizza sure is fantastic."
Search: "a HIGHLIGHT"
Output: (probably not what you want)

Upvotes: -1

Related Questions