Programmer
Programmer

Reputation: 39

Delete stopwords from text file in C#

I read two text files : the first contains Arabic text , I split it. The second contains the stop-words. I want to delete any stop-words (in the second file) from the first file, but I don't know how to do this:

FileStream fs = new FileStream(@"H:\\arabictext.txt", FileMode.Open);
StreamReader arab = new StreamReader(fs,Encoding.Default,true);
string artx = arab.ReadToEnd();
richTextBox1.Text = artx;
arab.Close();
char[] dele = {' ', ',', '.', '\t', ';','#','!' };

string[] words = richTextBox1.Text.Split(dele);

FileStream fsw = new FileStream("H:\\arab.txt", FileMode.Create);
StreamWriter arabw = new StreamWriter(fsw,Encoding.Default);

foreach (string s in words)
{
    arabw.WriteLine(s);
}

Upvotes: 0

Views: 928

Answers (2)

Programmer
Programmer

Reputation: 39

I found solution for my question.. do you have a better solution?

        char[] dele = { ' ', ',', '.', '\t', ';', '#', '!' };
        using (TextWriter tw = new StreamWriter(@"H:\output.txt"))
        {
            using (StreamReader reader = new StreamReader("H:\\arabictext.txt",Encoding.Default,true))
            {
                string line;

                while ((line = reader.ReadLine()) != null)
                {
                    string[] stopWord = new string[] { "قد", "في", "بيت", "فواصل", "هي", "من","$","ُ","ِ","ُ","ّ","ٍ","ٌ","ْ","ً" };


                    foreach (string word in stopWord)
                    {

                        line = line.Replace(word, "");

                    }

                    tw.Write(line);


                }
            }
        }
        FileStream fs = new FileStream(@"H:\\output.txt", FileMode.Open);
        StreamReader arab = new StreamReader(fs,Encoding.Default,true);
        string artx = arab.ReadToEnd();
        arab.Close();
        string[] words = artx.Split(dele);

        FileStream fsw = new FileStream("H:\\result.txt", FileMode.Create);
        StreamWriter arabw = new StreamWriter(fsw,Encoding.Default);
        foreach (string s in words)
        {

         arabw.WriteLine(s);

        }
        arabw.Close();
        arab.Close();

Upvotes: 0

Ali Bahrami
Ali Bahrami

Reputation: 6073

If I understand you correctly, you want to find stop-words from the first file and remove those stop-words from the second file.

Here is my workaround:

  1. Extract stop-words by split method from the first file
  2. Iterate extracted words from the first file and replace them with String.Empty in the content of 2nd file.
  3. Save the file

I simplified your code into the code below:

        // read file contents
        var fileContent1 = System.IO.File.ReadAllText("file1.txt");
        var fileContent2 = System.IO.File.ReadAllText("file2.txt");

        // extract stop-words from first file
        var words = fileContent1.Split(new char[] { ' ', ',', '.', '\t', ';', '#', '!' })
                                .Distinct();

        // rmeove stop words in file2
        foreach (var word in words)
            fileContent2.Replace(word, string.Empty);

        System.IO.File.WriteAllText("file2.txt", fileContent2);

Upvotes: 1

Related Questions