soosus
soosus

Reputation: 1217

Perl: Delete a string that does NOT start with a pattern

I have this input:

NP_001239382 1002 A G
NP_001074602 1005 A V
NP_001230039 100 A V
NP_932785 100 A V
NP_001164038 1018 A S

and want to turn it into this using some kind of regex:

NP_001239382
NP_001074602
NP_001230039
NP_932785
NP_001164038

Essentially, the constraint is "delete the string if it doesn't start with 'NP'", but I'm not exactly sure how to do this.

Upvotes: 0

Views: 157

Answers (2)

hmatt1
hmatt1

Reputation: 5139

I posted this as a comment but since it was all soosus was looking for I'll post it as an answer.

In this case, we don't have to match and remove words that don't start with NP. Since all the string we want are the first word of the line, we just need to remove everything after that. We can do that with this one-liner:

perl -ple 's/[ \t].*//' input.txt > output.txt

It removes everything after and including the first space or tab on each line, leaving us with the desired string.

Upvotes: 3

ysth
ysth

Reputation: 98398

Fairly simple:

s/(?:\s++|(?<!\S))(?!NP)\S++//;

though it would help if you were to say more about the problem; are these lines in a file? an array? all together in one string?

This gets rid of non-whitespace characters that don't start NP, along with any preceding whitespace (making sure this isn't the middle of an NP string if there isn't any).

Upvotes: 0

Related Questions