nonremovable
nonremovable

Reputation: 836

Using sed to eliminate all lines that do not match the desired form

I have a single column csv that looks something like this:

KFIG
KUNV
K~LK
K7RT
3VGT

Some of the datapoints are garbled in transmission. I need to keep only the entries that begin with a capital letter, then the other three digits could be a capital letter OR a number. For example, in the list above I would have to delete K~LK and 3VGT.

I know that to delete all but capital letters I can write

sed -n '/[A-Z]\{4,\}/p'

I just want to adjust this to where the last three digits could be capital letters or numbers. Any help would be appreciated.

Upvotes: 1

Views: 29

Answers (1)

werkritter
werkritter

Reputation: 1689

Just use:

sed -n '/[A-Z][A-Z0-9]\{3,\}/p'

However, if these identifiers are really all that there is in the file, I would propose the following command (it will assure that the whole line is matched, so it will reject for example identifiers more than 4 characters long):

sed -n '/^[A-Z][A-Z0-9]\{3\}$/p'
  • ^ means "match zero-length string at the beginning of line";
  • \{3\} means "match exactly 3 occurences of the previous atom", the previous atom being [A-Z0-9];
  • $ means "match zero-length string at the end of line".

Upvotes: 2

Related Questions