Reputation: 481

C++ sorting text

I was asked the following interview question:

If there are two string inputs what method can be used to print the letters the string inputs have in common. For example if the user input:

working

soaked

output:

ok

What is the best algorithm to solve the problem?

Upvotes: 4

Answers (7)

qwr

Reputation: 3670

Assuming it is english letters.

First I would use boolean array size of 26

bool array[26];

then make it true for each letter at index (letter-'a') (for upper letter-'A')

then for the second string do the same but this time cheking array[index]==true. if yes then add to common list;

Here is small test code:

 bool array[26];
  for(bool &arr:array )arr=false;
  char text1[]="working";
  char text2[]="soaked";
  char common[80]="";


  for(char x:text1){
    int m=x-'a'   ;
    if(m>=0 && m<26){
      array[m]   =true;
    }
  }
  int j=0;
  for(char x:text2){
    int m=x-'a'   ;
    if(m>=0 && m<26){
      if(array[m] ==true){
          common[j]=x;
          ++j;
           array[m]=false; //do not check again
      }

    }
  }
  common[j]='\0';
  printf("%s\n",common);

Upvotes: 2

njlarsson

Reputation: 2340

A simple and efficient solution, which doesn't need to allocate any more memory (for sets), would be to sort both strings, and then scan through them in parallell, like in the merge step of mergesort. Here is an implementation in plain C:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int charcmp(char *p, char *q) { return *p - *q; }

int main() {
    char a[] = "working";
    char b[] = "soaked";

    int al = strlen(a);
    int bl = strlen(b);

    qsort(a, al, 1, charcmp);
    qsort(b, bl, 1, charcmp);

    int i = 0, j = 0;
    while (i < al && j < bl) {
        if      (a[i] == b[j]) { printf("%c", a[i]); i++; j++; }
        else if (a[i] < b[j])  {                     i++;      }
        else                   {                          j++; }
    }
    printf("\n");
}

Of course, if the strings are as short as in the example, there is no point in sorting, just scan through one string and look up in the other. But this algorithm scales well with very big string (it's linearithmic, while the lookup version is quadratic).

Upvotes: 0

Jerry Coffin

Reputation: 490108

I would note that set_intersection can appply to any sorted collection, not just std::set. Instead of copying from the strings to sets, then doing set_intersection, I'd just sort the strings and do the set_intersection on them.

This has the advantage of working (what I'd interpret as) correctly in the presence of repeated letters. E.g. given inputs of "wood" and "book" it would produce "oo" instead of just "o".

Upvotes: 2

ChronoTrigger

Reputation: 8617

You tagged your question as c++ and @Igor provided what I think it's the optimal solution for that problem. However, I'd like to contribute with a language-agnostic pseudo-code that solves the problem using the same technique as std::set_intersection:

input: string a, string c
output: string c with common letters

sort(a) // O(nlog(n))
sort(b) // O(mlog(m))
c = empty

i = 0
j = 0
// O(min(n,m))
while i < length(a) && j < length(b) do
  if a[i] == b[j]
    add(c, a[i])
    // increment indexes avoiding duplicates
    do i = i + 1 until a[i] != a[i-1] or i == length(a)
    do j = j + 1 until b[j] != b[j-1] or j == length(b)
  else if a[i] < b[j]
    i = binary_search from a[i+1] to a[end] for b[j]
  else
    j = binary_search from b[j+1] to b[end] for a[i]
  end 
end

// Total cost (nlog(n))
return c

Upvotes: 2

bames53

Reputation: 88155

Your description isn't unambiguous but the way I read it you want to know what letters are the same including position.

#include <string>
#include <iostream>
#include <algorithm>

int main() {
  std::string const a = "working";
  std::string const b = "soaked";

  for (int i = 0; i < std::min(a.size(), b.size()); ++i) {
    if (a[i] == b[i]) {
      std::cout << a[i];
    }
  }
}

Produces:

ok

Upvotes: 4

cdhowie

Reputation: 168988

I would probably approach it like this:

std::set<char> common_characters(std::string const & a, std::string const & b) {
    std::set<char> common;

    std::string const & smaller = (a.size() < b.size()) ? a : b;
    std::string const & larger  = (a.size() < b.size()) ? b : a;

    std::set<char> chars(smaller.begin(), smaller.end());

    for (char i : larger) {
        auto found = chars.find(i);

        if (found != chars.end()) {
            common.insert(*found);
        }
    }

    return common;
}

It's not as slick as using set_intersection() (as in Igor's answer) but it has lower memory complexity and could perform faster as well, depending on the particular input strings.

Upvotes: 1

Igor Tandetnik

Reputation: 52471

string a = "working";
string b = "soaked";
set<char> setA(a.begin(), a.end());
set<char> setB(b.begin(), b.end());

vector<char> common;
set_intersection(setA.begin(), setA.end(), setB.begin(), setB.end(),
                 back_inserter(common));

copy(common.begin(), common.end(), ostream_iterator<char>(cout));

In fact, if no further processing is needed on the intersection, can send it straight to cout:

set_intersection(setA.begin(), setA.end(), setB.begin(), setB.end(),
                 ostream_iterator<char>(cout));

Upvotes: 12

C++ sorting text

Answers (7)

Related Questions