Using sed to eliminate all lines that do not match the desired form

Question

I have a single column csv that looks something like this:

KFIG
KUNV
K~LK
K7RT
3VGT

Some of the datapoints are garbled in transmission. I need to keep only the entries that begin with a capital letter, then the other three digits could be a capital letter OR a number. For example, in the list above I would have to delete K~LK and 3VGT.

I know that to delete all but capital letters I can write

sed -n '/[A-Z]\{4,\}/p'

I just want to adjust this to where the last three digits could be capital letters or numbers. Any help would be appreciated.

werkritter · Accepted Answer

Just use:

sed -n '/[A-Z][A-Z0-9]\{3,\}/p'

However, if these identifiers are really all that there is in the file, I would propose the following command (it will assure that the whole line is matched, so it will reject for example identifiers more than 4 characters long):

sed -n '/^[A-Z][A-Z0-9]\{3\}$/p'

^ means "match zero-length string at the beginning of line";
\{3\} means "match exactly 3 occurences of the previous atom", the previous atom being [A-Z0-9];
$ means "match zero-length string at the end of line".

Using sed to eliminate all lines that do not match the desired form

Answers (1)

Related Questions