JWyatt
JWyatt

Reputation: 129

Most efficient way to search String array for substring and return multiple values?

Say you're given the array containing:

Jordan

Jord

Anna

Rob

RobJord

And you want to return an array of all values that contain Jord (i.e Jord, Jordan, RobJord), what is the most efficient way to do this.

I'm using Java however I'm not allowed to use java.util Array functions.

Upvotes: 1

Views: 533

Answers (3)

user3059511
user3059511

Reputation:

It would take a bit of code to get everything set up and it would be horrible style, but you could transfer your Strings into char arrays, and have an int array which represents the ascii values of the letters in "Jord", so that you gain the benefit of checking by primitive rather than object referencing. Pass the chars you're checking against into a conditional block that evaluates it with the int values of

'J', 'o', 'r', 'd' //74, 111, 114, 100

Again, I only suggest this craziness because you have so much emphasis on efficiency. Right off the bat I'll say there's an efficiency drawback of the time it takes to transfer everything over to chars. The benefit would be best seen in large processing tasks, such as checking for Jord in an entire 1000 page eBook because the initializing only happens once (or in large chunks I suppose with huge data perhaps, but still beneficial either way)

//assuming its case sensitive: ascii values for 'J' 'o' 'r' 'd'
int[] charArr = new int[]{74, 111, 114, 100};

Again, it requires some setting up which hinders performance, plus its just weird, but it does give you the benefit of validating by primitive int.

Another thought would be to consider the statistics of certain letters being followed by another letter. For example, the likelihood of "J" being followed by any vowel is extremely high, and thus "J" being followed by "o" yet still not being "Jord" is therefore extremely high since we only have 5 vowels(plus y, that weird one...) You might get "Jork" for example and you've wasted checking "o" and "r". So with that being said, perhaps it would be better to move the scanner up a few letters (or your current array index counter - whichever way you're iterating) to check for "d" after you've established a match for "J". I think that would increase the efficiency.

Basically I'm saying if you construct it in such a way that it checks letter by letter in an iterating manner, Step one would be to match "J", and then step 2 would be to skip over "o" and check for "r" or "d" instead. Or in other words, find a candidate, and eliminate candidates aggressively

EDIT: I'd actually say check for "d" in step 2 and don't consider checking "r" until step 3 if step 2 checks out because that way your code will be simpler - start at the start, move to the end, then iterate backwards to the start+1. If you check for "r" in step 2 then step 3 and 4 will be zigzagging indices to traverse

Upvotes: 0

user2864740
user2864740

Reputation: 62005

Well, since this sounds like homework, it's for you to solve, but I would consider this very-English pseudo-code. It avoids the use of java.util.* (e.g. ArrayList or Arrays classes) and only uses primitive constructs.

count = 0
for each item in the input
    if the rule matches
       increase count by 1

create output array of size count

target index = 0
for each item in the input
    if the rule matches
        add the item to the output array at the target index,
        and increase the target index by 1

return the output array

This code is O(n) in complexity, even though it loops through the input (n) twice because that's a constant factor, and O(2*n) is 2*O(n) is O(n).

Now, the constant bounds could be slightly reduced by, instead of only counting on the first pass, also compacting the values on the first pass, and then only copying the compacted values, which would be less than or equal to n, to a new smaller array. It would still be O(n), but it may have a slightly lower wall-clock time .. or it might perform worse depending on subtle cache/JIT/data factors. Oh, the fun intricacies of modern computers!

There is no trivial way to improve the O(n) "efficiency" bounds - and especially not for one run.

Upvotes: 1

Aman Agnihotri
Aman Agnihotri

Reputation: 3033

This approach comes to my mind:

public ArrayList<String> search(String searchString, String[] names)
{
  ArrayList<String> searchList = new ArrayList<String>();

  for (String name : names)
  {
    if(name.contains(searchString))
    {
      searchList.add(name);
    }
  }

  return searchList;
}

Now to search, use this:

String[] names = {"Jordan", "Jord", "Anna", "Rob", "RobJord"};
String searchString = "Jord";

ArrayList<String> filterList = search(searchString, names);

It doesn't use java.util.Arrays methods, and also gets the job done in a clean way, not to mention, its fast.

Now if you can't even use ArrayList, then you have two choices:
1. Make your own implementation of ArrayList and use that.
2. Follow the following method:

public String[] search(String searchString, String[] names)
{
  int size = getSize(searchString, names);
  String[] searchList = new String[size];

  int index = 0;
  for (String name : names)
  {
    if(name.contains(searchString))
    {
      searchList[index++] = name;
    }
  }

  return searchList;
}

// Returns appropriate size for the Search List
private int getSize(String searchString, String[] names)
{
  int size = 0;
  for (String name : names)
  {
    if(name.contains(searchString))
    {
      size++;
    }
  }

  return size;
}

Upvotes: 2

Related Questions