mzaboss
mzaboss

Reputation: 31

NOTEPAD++ How to count each word in a list of textfile?

I have a list of textfile have the follow example:

ALEX
MIKE
JOSHUA
AMBER
ALEX
ROBERT
CHRIS
ALEX
JOSHUA
MICHAEL
ROOGER
ALEX
AMBER

I want to count how many each word has been written.

example:

alex (4)
MIKE (1)
JOSHUA (2) 
AMBER (2)
etc..

how to do that in notepad++ ?

Upvotes: 3

Views: 5372

Answers (2)

Debie Downer
Debie Downer

Reputation: 241

While I don't know of an easy way to do this in a usual number system (e.g. decimal or binary) without using something like Python Script or another plugin, I figured that I can count them in the unary numeral system, and I get a free bar chart in the process :)

For all replacements, Select "Regular expression" with ". matches newline" unchecked; check or uncheck "Match case" as you desire.

  1. Order lines by "Edit -> Line Operations -> Sort Lines Lexicographically Ascending"
  2. Replace all ^ with 1 to add 1 to the start of every line
  3. Replace all 1(.+$)\R(?=1\1$) with 1 to remove duplicates while preserving 1s
  4. Replace all ^(1*) with \1 to add a space after counts
  5. Optionally, order by count using "Edit -> Line Operations -> Sort Lines Lexicographically Descending"

I wrote this this way so that AALEX and ALEX aren't processed as duplicates, nor are ALE and ALEX, but also so that Regex can do it in one go without hitting replace repeatedly.

This would obviously doesn't work if some of your words start with 1; if that is the case, just use a different character that doesn't occur in your text as the counting character.

I liked this method of doing this, with unary numbers at start of lines, since:

  1. The result is effectively a bar chart
  2. Results are easily orderable by usage as mentioned in step 5.
  • If you don't need points 1. and 2. and want numbers at the end of each line as in your question; instead of steps 4. and 5. above, replace ^(1*)(.+)$ with \2 \(\1\)
  1. If you need to see a decimal number instead of a unary number (unary numbers are hard to read once it gets above 4 or 5), just double select the number, and Notepad++ status bar will show it in decimals since it shows the selected number of characters (e.g. Sel : 7)
  2. All these steps are macro recordable, so that you can do it once and record it, and then do it again whenever you like with a shortcut or from the menu

So, in your example, this would give:

1111 ALEX
11 JOSHUA
11 AMBER
1 ROOGER
1 ROBERT
1 MIKE
1 MICHAEL
1 CHRIS

Alternatively:

ALEX (1111)
JOSHUA (11)
AMBER (11)
ROOGER (1)
ROBERT (1)
MIKE (1)
MICHAEL (1)
CHRIS (1)

Upvotes: 1

Hans Franke
Hans Franke

Reputation: 71

There is no inbuild words frequency counter. The available RegExp operations do not allow the insertion of counting variables.

The build in smart highlighting will only show all occurancees of the actual line. Same goes for the count functionallity of the find dialog (match all instances of a word, count will be shown, then repeat). For short lists such a single steping might work.

Unless you're about to write a new plugin or some external programm, using a web service might be a quick solution (Word Frequency Counter or WordCounter).

On Unix/Linux, sort file.txt | uniq -c | sort -nr will give a result like intended.

Upvotes: 2

Related Questions