m.lapeyre
m.lapeyre

Reputation: 486

Count sequences of letters in a string - C++

I'm working on a simple substitution-cipher decoder. I'm using frequency analysis to decrypt the ciphertext. Just looking at the frequency of unique letter isn't enough. I need to look at the occurrences of 2-letter sequences (maybe 3-letter sequences).

My code for counting the occurrences of each letter is below

int counterRaw[256][2] = {0};
    for(int i = 0; i <inputString.length(); i++)
        counterRaw[inputString[i]][1]++;

int counterLetter[26][2] = {0};
    for(int i = 0 ; i<26 ; i++){
        counterLetter[i][0] = 'A'+i;
        counterLetter[i][1] = counterRaw['A'+i][1];

As you can see very simple yet effective !

But I don't know how to achieve a 2-letter sequence counter, do you have any idea which could help me to code this ?

Thanks !

EDIT : As an example Given AZAZ RTYU JKLM I want my program to output :

AZ : 2
ZA : 1
ZR : 1
RT : 1
...

Upvotes: 0

Views: 2063

Answers (4)

Axalo
Axalo

Reputation: 2953

Here you go (based on the idea of user3723779):

#define MAKEWORD(a, b) (((a) << 8) | (b))

std::string noSpaces(std::string s)
{
    int pos;
    while((pos = s.find(' ')) != std::string::npos)
    {
        s.erase(pos, 1);
    }
    return s;
}

std::map<short, int> seqDet2(const std::string &s)
{
    int length = s.length();
    if(length == 0) return std::map<short, int>();

    // assert(sizeof(char) == 1);
    std::vector<short> v;

    for(int i = 0; i < length - 1; ++i)
    {
        v.push_back(MAKEWORD(s[i], s[i + 1]));
    }

    std::map<short, int> occ;
    for(auto x: v)
    {
        occ[x]++;
    }

    return occ;
}

int main()
{
    std::string s = "AZAZRTYUI AZTWI";
    auto occ = seqDet2(noSpaces(s));

    for(auto x: occ)
    {
        unsigned char b = (unsigned char)x.first;
        unsigned char a = (unsigned char)(x.first >> 8);
        printf("%c%c - %d\n", a, b, x.second);
    }

    getchar();
}

Upvotes: 0

HexedAgain
HexedAgain

Reputation: 1031

Something like the following would do the trick, though you'd have to do some jiggery pokery to make it suit your own needs.

#include <iostream>
#include <map>
#include <string>

int main ()
{
    std::string message("some string that you will probably get from some encrypted file");
    std::map<std::string,int> occurences;

    std::string seq("  ");
    for(int i = 1; i < message.length() - 1; i++)
    {
        seq[0] = message[i-1];
        seq[1] = message[i];

        //ignore spaces
        if (seq.compare(0,1, " ") && seq.compare(1,1, " "))
        {
            occurences[seq]++;
        }
    }

    //let's have a look ...
    for(auto iter = occurences.begin(); iter != occurences.end(); ++iter)
    {
        std::cout << iter->first << "   " << iter->second << "\n";
    }
    return 0;
}

output:

ab   1
at   1
ba   1
bl   1
cr   1
ed   1
en   1 
et   1
fi   1
fr   1
ge   1
ha   1
il   2
in   1
ll   1
ly   1
me   2
nc   1
ng   1
ob   1
om   3
ou   1
pr   1
pt   1
ri   1
ro   2
ry   1
so   2
st   1
te   1
th   1
tr   1
wi   1
yo   1
yp   1

Upvotes: 1

user3723779
user3723779

Reputation: 287

You should create a "composite letter" from two letters. As letters in C,C++ are numbers, you can just convert each of the 2 letters to a number ( the characters are already numbers ) and than create a number with two numbers. e.g. int C=inputString[i]+256*inputString[i+1]. The above with the supposition that the strings are of char and chars are between 0 and 255 ( better than signed ).

Upvotes: 1

basav
basav

Reputation: 1495

What you are doing right now is a counting sort. A radix sort would be a viable option for you if you take multiple digits into consideration.

Upvotes: 0

Related Questions