Randolph
Randolph

Reputation: 37

Fast way to get two first and last characters of a string from the input

Here is my way of doing it, HOWEVER, I do believe there is a smarter way, which is why I am asking this question. Could you please tell me, unexperienced and new C++ programmer, what are possible ways of doing this task better?

Thank you.

string word;
getline(cin, word);

// results - I need only those 5 numbers:
int l = word.length();
int c1 = word[0];
int c2 = word[1];
int c3 = word[l-2];
int c4 = word[l-1];

Why do I need this? I want to encode a huge number of really long strings, but I figured out I really need only those 5 values I mentioned, the rest is redundant. How many words will be loaded? Enough to make this part of code worth working on :)

Upvotes: 1

Views: 1158

Answers (3)

kuroi neko
kuroi neko

Reputation: 8641

If you want to optimize this (although I can't imagine why you would want to do that, but surely you have your reasons), the first thing to do is to get rid of std::string and read the input directly. That will spare you one copy of the whole string.

If your input is stdin, you will be slowed down by the buffering too. As it has already been said, the best speed woukd be achieved by reading big chunks from a file in binary mode and doing the end of line detection yourself.

At any rate, you will be limited by the I/O bandwidth (disk access speed) in the end.

Upvotes: 0

Thomas Matthews
Thomas Matthews

Reputation: 57678

The first two letters are easy to obtain and fast.

The issue is with the last two letters.

In order to read a text line, the input must be scanned until it finds an end-of-line character (usually a newline). Since your text lines are variable, there is no fast solution here.

You can mitigate the issue by reading in blocks of data from the file into memory and searching memory for the line endings. This avoids a call to getline, and it avoids a double search for the end of line (once by getline and the other by your program).

If you change the input to be fixed with, this issue can be sped up.

Upvotes: 1

Mark Ransom
Mark Ransom

Reputation: 308111

I will take you at your word that this is something that is worth optimizing to an extreme. The method you've shown in the question is already the most straight-forward way to do it.

I'd start by using memory mapping to map chunks of the file into memory at a time. Then, loop through the buffer looking for newline characters. Take the first two characters after the previous newline and the last two characters before the one you just found. Subtract the address of the second newline from the first to get the length of the line. Rinse, lather, and repeat.

Obviously some care will need to be taken around boundaries, where one newline is in the previous mapped buffer and one is in the next.

Upvotes: 2

Related Questions