Reputation: 2491
How to find the occurrence of letters A-Z regardless(ignore case) in a optimized way even if the file size is as large as 4GB or more ? What could be the different implementations possible in C++/C ?
One implementation is :
Pseudocode
A[26]={0}
loop through each character ch in file
If isalpha(ch)
A[tolower(ch)-'A']+ = 1
End If
end loop
Upvotes: 2
Views: 1597
Reputation: 75986
Not much optimization left, I think.
Instead of computing tolower()-'A'
for each element, just count occurrences of each character (in a char[256]
accumulator), and do the case-aware computation afterwards (Might be more efficient or not, just try).
Be sure to use buffered input (fopen
, perhaps assign larger buffer with setvbuf
).
Eg:
acum[256]={0}
loop through each character 'c' in file
acum[c]++
end loop
group counts corresponding to same lowercase/uppercase letters
Also, bear in mind that this assumes ASCII or derived (one octet = one character) encoding.
Upvotes: 9
Reputation: 67283
This is not going to be instantaneous with 4GB. I see know way to do what you are doing much faster.
In addition, your code wouldn't handle tabs, spaces or other characters. You need to use isalpha()
and only increment the count if it returns true.
Note that isalpha()
is extremely fast. But, again, this code would not be instantaneous with a very large input.
TCHAR a[26] = { 0 };
for (int i = 0; i < length; i++)
{
if (isalpha(text[i]))
{
a[tolower(text[i]) - 'a']++;
}
}
Upvotes: 1