kazuo
kazuo

Reputation: 21

Edit large textfile in mac terminal

I have this very large dictionary file with 1 word on each line, and I would like to trim it down.

What I would like to do is leave 3-6 letter improper nouns, so it has to detect the words based on these:

  1. if the word is less than 3 letters, delete it
  2. if the word is more than 6 letters, delete it
  3. if the word has a capital letter, delete it
  4. if the word has a single quote or space, delete it.

I used this:

cat Downloads/en-US/en-US.dic | egrep '[a-z]{3,6}' > Downloads/3-6.txt

but the output is incorrect. It outputs the words with greater than 3 characters alright, but that's about my progress so far.

So how do I go about doing this in the mac terminal? There must be a way to do this right?

Upvotes: 2

Views: 1286

Answers (2)

Anirvan
Anirvan

Reputation: 6364

The following command will select only words that consist of exactly three to six lowercase a-z letters:

egrep '^[a-z]{3,6}$' /usr/share/dict/words > filtered.txt

Replace /usr/share/dict/words with your input file, and filtered.txt with a name for your output file. I just verified that this works on my Mac. Hope this helps!

Upvotes: 2

Billy Moon
Billy Moon

Reputation: 58619

Use grep and write a regex rule to match the lines you want to keep. You can get info on grep by typing man grep in the terminal.

Upvotes: 1

Related Questions