user965692
user965692

Reputation: 221

Removing stop words in one file from another file

I have a file File1.txt which has some words. I have another file (called blacklistwords.txt) and I need to remove all the words contained in the blacklistwords.txt from the original file1.txt.

File1.txt
----------
return  25
murder  28
another  54
stackoverflow  12
response  16
violence  32


blacklistwords.txt
------------------
violence
murder
crime

This is how the output should look like:

Final output:
-------------
return  25
another  54
stackoverflow  12
response  16

Upvotes: 1

Views: 2089

Answers (3)

Al S
Al S

Reputation: 375

While breaking down user965692's solution for my own case, I found the need for another option, -w, which only searches whole words.

To break it down fully:

  • -i tells grep to ignore case
  • -F tells grep to expect a list of fixed strings
  • -w searches whole words (i.e., if "flow" was a stopword, it would not match "overflow"
  • -v inverts the matches (i.e., only print what is not in the list of strings
  • -f blacklistwords.txt obtains the patterns from the lines of the given file

Hence, to remove all of the blacklisted words:

grep -i -F -w -v -f blacklistwords.txt file1.txt

Upvotes: 0

user2719058
user2719058

Reputation: 2233

Your solution is basically correct.

Let me just note that you did not ask for case-insensitivite matching, and adding it via the -i switch imposes quite a big performance penalty, at least for unicode environments, so you might want to strip that if it's not really needed.

Upvotes: 0

user965692
user965692

Reputation: 221

I tried this and it worked:

grep -i -F -v -f blacklistwords.txt file1.txt

Upvotes: 2

Related Questions