Reputation: 43
I am trying to come up with a Regular Expression that I can use to find lines in a txt file that contain names in ALL CAPS using Notepad++ or similar tool. Once I find a line that matches I want to add three line breaks.
I have various conditions since the lines are names. Some of the names are only two characters. Some have hyphens. Some have multiple names. Some don't have spaces after their last name and comma. Here are some examples:
I can run this in other programs as well. Just trying to figure this out so I can get it finished.
EDIT: I was using [A-Z]+, [A-Z]+ but it didn't select the whole line and it didn't account for spaces and hyphens.
ANSWER: The following regex met my needs:
^(?!.*[a-z])(?!.*[0-9]).+$
Part 2 ANSWER: I also made an adjustment in order to do the second part of my request which was to add three line breaks ahead of the matched item.
^((?!.*[a-z\d]).+)$
I also made sure Match Case was selected. It was using Regular Expression. and replaced with the following:
\n\n\n\1
Thanks Everyone!
Upvotes: 3
Views: 4732
Reputation: 425208
Use a negative look ahead for a lowercase char:
^(?!.*[a-z]).+$
This matches "any line that doesn't contain a lowercase letter".
To also disallow numbers:
^(?!.*[a-z\d]).+$
Upvotes: 5
Reputation: 84413
This will work for your provided corpus using GNU grep. Adapt to suit any changes to your data.
$ grep \
--extended-regexp \
--only-matching \
--regexp='[[:upper:]-]+, ?[[:upper:]]+' \
/tmp/corpus
DOE, JOHN
DOE-SMITH, JOHN
DO, JO
DOE, JOHN
DOE,JOHN
You can perform this operation with the append operation in GNU sed. For example:
$ sed \
--regexp-extended '/[[:upper:]-]+, ?[[:upper:]]+/a\\n\n\n' \
/tmp/corpus
DOE, JOHN L
DOE-SMITH, JOHN L
DO, JO L
DOE, JOHN BOB L
DOE,JOHN L
Upvotes: 0