Ralfs R
Ralfs R

Reputation: 77

How to split a string by ignoring commas c#?

I have made a little project that takes .cs files, reads them and returns the most frequent word in the file. However, right now it returns that the most common word is a comma. How can i make it so splitting the string ignores commas?

For example: i have a string:

, . ? a a, b cdef cfed, abef abef abef,

right now it returns that the most common word is 'abef' and it occured 2 times (the program doesn't count the third abef, the one which is with a comma in the end.)

Another example:

, . ? a a, b cdef cfed, abef abef abef, , ,

this right now returns that the most common word is a comma ',' and it occured 3 times, but the thing is - i want my program to ignore commas and focus purely on words only.

namespace WindowsFormsApp8
{
  public partial class Form1 : Form
  {
    public Form1()
    {
        InitializeComponent();
    }


    private async void button1_Click(object sender, EventArgs e)
    {
        using (OpenFileDialog ofd = new OpenFileDialog() { Filter = "Text Documents |*.cs;*.txt", ValidateNames = true, Multiselect = false }) //openfiledialog (all .cs; all.txt)
        {
            if (ofd.ShowDialog() == DialogResult.OK) //if in file dialog a file gets selected
            {
                using (StreamReader sr = new StreamReader(ofd.FileName)) //text reader
                {
                    richTextBox1.Text = await sr.ReadToEndAsync(); //reads the file and returns it into textbox
                }
            }
        }
    }

    private void button2_Click(object sender, EventArgs e)
    {          
        string[] userText = richTextBox1.Text.ToLower().Split( ' ' );
        var frequencies = new Dictionary<string, int>(); // variable frequencies, dictionary with key string, value int.
        string highestWord = null;  //declare string highestword with starting value null.
        int highestFreq = 0; //declare integer highestfreq with starting value zero.

        foreach (string word in userText) //search words in our array userText that we declared at the beginning.
        {
            int freq; //declare integer freq.
            frequencies.TryGetValue(word, out freq); //trygetvalue from dictionary key, out value.
            freq += 1; //count it.

            if (freq > highestFreq) 
            {
                highestFreq = freq;
                highestWord = word;
            }
            frequencies[word] = freq; //assign most frequent word in frequencies dictionary to freq
        }
        MessageBox.Show("the most occuring word is: " + highestWord + ", it occured " + highestFreq + " times"); //display data to messagebox.
    }
  }
}

Upvotes: 0

Views: 460

Answers (3)

Steve
Steve

Reputation: 216273

Split can take an array of chars to split on. So you can split on space and comma. Then remove the empty entries with the appropriate StringSplitOption

 string[] userText = richTextBox1.Text.ToLower().Split(new char[] { ' ', ','}, StringSplitOptions.RemoveEmptyEntries );

Also you can use Linq to calculate the frequency of a word with code like this

var g = userText.GroupBy(x => x)
                .Select(z => new 
                { word = z.Key, count = z.Count()})
                .ToList();
string mostUsed = g.OrderByDescending(x => x.count)
                   .Select(x => x.word)
                   .FirstOrDefault();

Upvotes: 6

Alexei - check Codidact
Alexei - check Codidact

Reputation: 23078

Another option is to make the splitting easier to extend by using regular expressions, Regex.Split more specifically:

  string input = ", . ? a a, b cdef cfed, abef abef abef, , ,";
  string[] result = Regex.Split(input, @"\w+");

Check live testing here.

If ? is a valid word, than the regex could be @"\w+|\?".

So, my recommendation is to use regex, even if the split method is enough for now, since it is more powerful and can easily accommodate for later changes.

As a bonus, it is nice to learn about regular expressions.

Upvotes: 2

akerra
akerra

Reputation: 1200

You could replace the commas with an empty string, then run the output through your algorithm.

string original = ", . ? a a, b cdef cfed, abef abef abef,";
string noCommas = original.Replace(",", string.Empty);

Reference: https://msdn.microsoft.com/en-us/library/fk49wtc1(v=vs.110).aspx

Upvotes: 3

Related Questions