the_Ninja
the_Ninja

Reputation: 11

Sorting a vector of strings by the first letter in non-ascii order in C++

I have a text file with a list of words.

I used ifstream to read these words into a vector and now I am trying to sort them in an order similar to:

A a B b C c [...]

I tried to implement this using a third for loop inside of a bubble search algorithm to look at the first character of each word (I know this is far from the most efficient way especially if I was using a large data set)

And then check whether the letter and the next letter were uppercase or lowercase and switching if the uppercase letter was the same letter as the current letter, but this didn't seem to work.

void bubble_Sort (vector <string> & words)
{
  for (unsigned i = words.size(); i >= 2; --i)
  {
    for (unsigned k = 0; k + 1 < i; k++)
    {
      int hi = k+1;
      string temp1 = words[hi];
      string temp2 = words[k];
      int smallsize = words[hi].size();
      int smallprecedence = 0;

      if (words[k].size() < words[hi].size())
        smallsize = words[k].size();

      for (unsigned j = 0; j < smallsize; j++)
      {
        if (temp1[j] >= 'A' && temp1[j] <= 'Z')
        {
          if (temp2[j] >='a' && temp2[j] <= 'z')
          {
            char lowercase1 = temp1[j] + 32;
            if (lowercase1 == temp2[j])
            {
              string temp = words[k];
              words[k] = words[hi];
              words[hi] = temp;
              break;
            }
          }

          else if (temp2[j] >= 'A' && temp2[j] <= 'Z')
          {
            if (temp1[j] < temp2[j])
            {
              string temp = words[k];
              words[k] = words[hi];
              words[hi] = temp;
              break;
            }
          }
        }

        if (temp1[j] >= 'a' && temp1[j] <= 'z')
        {
          if (temp2[j] >= 'A' && temp2[j] <= 'Z')
          {
            char uppercase1 = temp1[j] - 32;
            if (uppercase1 < temp2[j])
            {
              string temp = words[k];
              words[k] = words[hi];
              words[hi] = temp;
              break;
            }
          }

          else if (temp2[j] >= 'a' && temp2[j] <= 'z')
          {
            if (temp1[j] < temp2[j])
            {
              string temp = words[k];
              words[k] = words[hi];
              words[hi] = temp;
              break;
            }
          }
        }

        else if (temp1[j] == temp2[j] && temp1.size() < temp2.size())
          ++smallprecedence;
      }

      if (smallprecedence == smallsize)
      {
        string temporary = words[k];
        words[k] = words[hi];
        words[hi] = temporary;
      }
    }
  }
}

Upvotes: 1

Views: 2214

Answers (4)

Mark Lakata
Mark Lakata

Reputation: 20818

Don't reinvent the wheel. Just modify the default comparison function so aA < bB (regardless of case) and A < a.

EDIT I used the wrong comparison function. It should return true for <, and false for >=. This has been fixed

std::vector<std::string> vec;
//
std::sort(vec.begin(), vec.end(), [](const std::string& lhs, const std::string& rhs)
{ 
   const char* s1=lhs.c_str();
   const char* s2=rhs.c_str();
   while(true) {
     // first ignore case
     if ( std::toupper(*s1) < std::toupper(*s2) ) return true;
     if ( std::toupper(*s1) > std::toupper(*s2) ) return false;
     // end of both strings, exact match
     if ( *s1 == 0 && *s2 == 0 ) return false;
     // compare upper case vs lower case ('A' vs 'a')
     if ( *s1 > *s2) return false;
     if ( *s1 < *s2) return true;
     ++s1; ++s2;
  }  
});

Upvotes: 4

AndreyS Scherbakov
AndreyS Scherbakov

Reputation: 2788

I think it's really enough to skip exactly equal prefixes and then compare once with uppercasing:

std::vector<std::string> vec;
//
std::sort(vec.begin(), vec.end(), [](const std::string& lhs, const std::string& rhs)
{ 
  const char* s1=lhs.c_str();
  const char* s2=rhs.c_str();
  while(*s1 && *s1 == *s2) {++s1; ++s2;}  
  int rc = toupper(*s1) - toupper(*s2);
  if (rc) return rc;
  return *s1 - *s2;
});

If you need to compare by first letter only, simply remove while(*s1 && *s1 == *s2) {++s1; ++s2;}

Upvotes: 0

ObliteratedJillo
ObliteratedJillo

Reputation: 5166

You can sort a vector using std::sort and get a reference to the first character in a std::string using std::string::at() :

std::vector<std::string> vec;
//
std::sort(vec.begin(), vec.end(), [](const std::string& lhs, const std::string& rhs)
{ 
    char l_ch, r_ch;
    l_ch = lhs.at(0);
    r_ch = rhs.at(0);
    return l_ch < r_ch;
});

Upvotes: 0

Pete Becker
Pete Becker

Reputation: 76245

First, get rid of the hard-coded ASCII-isms. C and C++ have long had functions for determining whether a character is a letter, a digit, uppercase, lowercase, etc. Look them up.

Second, describe clearly what goes into determining the order that you want the result to be in.

Third, from that description, write a function that takes two strings, and tells you whether the first string should come before the second. Use that function in the sort.

Upvotes: 1

Related Questions