Reputation: 31
I have a list of textfile have the follow example:
ALEX
MIKE
JOSHUA
AMBER
ALEX
ROBERT
CHRIS
ALEX
JOSHUA
MICHAEL
ROOGER
ALEX
AMBER
I want to count how many each word has been written.
example:
alex (4)
MIKE (1)
JOSHUA (2)
AMBER (2)
etc..
how to do that in notepad++ ?
Upvotes: 3
Views: 5372
Reputation: 241
While I don't know of an easy way to do this in a usual number system (e.g. decimal or binary) without using something like Python Script or another plugin, I figured that I can count them in the unary numeral system, and I get a free bar chart in the process :)
For all replacements, Select "Regular expression" with ". matches newline" unchecked; check or uncheck "Match case" as you desire.
Edit -> Line Operations -> Sort Lines Lexicographically Ascending
"^
with 1
to add 1 to the start of every line1(.+$)\R(?=1\1$)
with 1
to remove duplicates while preserving 1s^(1*)
with \1
to add a space after countsEdit -> Line Operations -> Sort Lines Lexicographically Descending
"I wrote this this way so that AALEX and ALEX aren't processed as duplicates, nor are ALE and ALEX, but also so that Regex can do it in one go without hitting replace repeatedly.
This would obviously doesn't work if some of your words start with 1
; if that is the case, just use a different character that doesn't occur in your text as the counting character.
I liked this method of doing this, with unary numbers at start of lines, since:
^(1*)(.+)$
with \2 \(\1\)
Sel : 7
)So, in your example, this would give:
1111 ALEX
11 JOSHUA
11 AMBER
1 ROOGER
1 ROBERT
1 MIKE
1 MICHAEL
1 CHRIS
Alternatively:
ALEX (1111)
JOSHUA (11)
AMBER (11)
ROOGER (1)
ROBERT (1)
MIKE (1)
MICHAEL (1)
CHRIS (1)
Upvotes: 1
Reputation: 71
There is no inbuild words frequency counter. The available RegExp operations do not allow the insertion of counting variables.
The build in smart highlighting will only show all occurancees of the actual line. Same goes for the count functionallity of the find dialog (match all instances of a word, count will be shown, then repeat). For short lists such a single steping might work.
Unless you're about to write a new plugin or some external programm, using a web service might be a quick solution (Word Frequency Counter or WordCounter).
On Unix/Linux, sort file.txt | uniq -c | sort -nr
will give a result like intended.
Upvotes: 2