Reputation: 21
I need to make a frequency analysis console program using c#. It has to show the 10 most frequent letters from a textfile. I have managed to display the first 10 letters read by the program and the frequency of each character. I, however, don't know how to sort the dictionary. This is the code I have so far.
I must also give the user the option to the frequency analysis in case sensitive mode (as it is right now) and case insensitive. Help with this issue will also be appreciated. Thank You!
static void Main(string[] args)
{
// 1.
// Array to store frequencies.
int[] c = new int[(int)char.MaxValue];
// 2.
// Read entire text file.
// string root = Server.MapPath("~");
// string FileName = root + "/App_Data/text.txt";
//string s = File.ReadAllText(FileName);
foreach (string line in File.ReadLines(@"c:\Users\user\Documents\Visual Studio 2015\Projects\ConsoleApplication1\ConsoleApplication1\App_Data\text.txt", Encoding.UTF8)) {
var fileStream = new FileStream(@"c:\Users\user\Documents\Visual Studio 2015\Projects\ConsoleApplication1\ConsoleApplication1\App_Data\text.txt", FileMode.Open, FileAccess.Read);
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8))
{
string line2;
while ((line2 = streamReader.ReadLine()) != null)
{
// process the line
// 3.
// Iterate over each character.
foreach (char t in line)
{
// Increment table.
c[(int)t]++;
}
// 4.
// Write all letters found.
int counter = 0;
for (int i = 0; i < (int)char.MaxValue; i++)
{
if (c[i] > 0 && counter < 11 &&
char.IsLetterOrDigit((char)i))
{
++counter;
Console.WriteLine("Letter: {0} Frequency: {1}",
(char)i,
c[i]);
}
}
}
}
Console.ReadLine();
}
}
Upvotes: 0
Views: 451
Reputation: 23786
I like @Dmitry Bychenko's answer because it's very terse. But, if you have a very large file then that solution may not be optimal for you. The reason being, that solution has to read the entire file into memory to process it. So, in my tests, I got up to around 1GB of memory usage for a 500MB file. The solution below, while not quite as terse, uses constant memory (basically 0) and runs as fast or faster than the Linq version in my tests.
Dictionary<char, int> freq = new Dictionary<char, int>();
using (StreamReader sr = new StreamReader(@"yourBigFile")) {
string line;
while ((line = sr.ReadLine()) != null) {
foreach (char c in line) {
if (!freq.ContainsKey(c)) {
freq[c] = 0;
}
freq[c]++;
}
}
}
var result = freq.Where(c => char.IsLetterOrDigit(c.Key)).OrderByDescending(x => x.Value).Take(10);
Console.WriteLine(string.Join(Environment.NewLine, result));
Upvotes: 1
Reputation: 186668
If all you want to do is to found frequencies, you don't want any dictionaries, but a Linq. Such tasks are ones Linq has been designed for:
...
using System.Linq;
...
static void Main(string[] args) {
var result = File
.ReadLines(@"...", Encoding.UTF8)
.SelectMany(line => line) // string into characters
.Where(c => char.IsLetterOrDigit(c))
.GroupBy(c => c)
.Select(chunk => new {
Letter = chunk.Key,
Count = chunk.Count() })
.OrderByDescending(item => item.Count)
.ThenBy(item => item.Letter) // in case of tie sort by letter
.Take(10)
.Select(item => $"{item.Letter} freq. {item.Count}"); // $"..." - C# 6.0 syntax
Console.Write(string.Join(Environment.NewLine, result));
}
Upvotes: 3
Reputation: 805
It would be easier to use the actual Dictionary type in C# here, rather than an array:
Dictionary<char, int> characterCountDictionary = new Dictionary<char, int>();
You add a key if it doesn't exist already (and insert a value of 1), or you increment the value if it does exist. Then you can pull out the keys of your dictionary as a list and sort them, iterating to find the values. If you do case insensitive you'd just convert all upper case to lower case before inserting into the dictionary.
Here's the MSDN page for the examples for Dictionary: https://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.110).aspx#Examples
Upvotes: 0