ojholland
ojholland

Reputation: 5

Is there a way to remove everything before and including a tab (or space) for each line of a file using sed?

I have a file where I want to remove everything before and including the first space for each line. For example, if my file looks like this:

>JQ907469.1 Gracilariopsis mclachlanii voucher BG0072 23S ribosomal RNA gene, partial sequence; plastid
>JQ907467.1 Gracilariopsis longissima voucher BG0052 23S ribosomal RNA gene, partial sequence; plastid
>JQ907456.1 Hydropuntia rangiferina voucher BG0092 23S ribosomal RNA gene, partial sequence; plastid
>JQ907428.1 Gracilaria cornea voucher BG0112 23S ribosomal RNA gene, partial sequence; plastid
>JQ952662.1 Gracilariopsis tenuifrons voucher BG0042 23S ribosomal RNA gene, partial sequence; plastid

I want it to look like this

Gracilariopsis mclachlanii voucher BG0072 23S ribosomal RNA gene, partial sequence; plastid
Gracilariopsis longissima voucher BG0052 23S ribosomal RNA gene, partial sequence; plastid
Hydropuntia rangiferina voucher BG0092 23S ribosomal RNA gene, partial sequence; plastid
Gracilaria cornea voucher BG0112 23S ribosomal RNA gene, partial sequence; plastid
Gracilariopsis tenuifrons voucher BG0042 23S ribosomal RNA gene, partial sequence; plastid

I assume I can use sed to achieve my goal, but I'm not familiar enough with the notation and syntax of it yet to experiment. In the spirit of that, I'd love it if someone has a solution if they could explain why the code works the way it does.

Cheers

Upvotes: 0

Views: 522

Answers (1)

tink
tink

Reputation: 15206

Employing a regex, and assuming you're using a reasonably current GNU sed:

sed -r 's/[^ \t]+[ \t]//' yourfile

If you're happy with how that looks, make that

sed -i -r 's/[^ \t]+[ \t]//' yourfile

How does it work? s/ starts a search & replace

^[^ \t]+[ \t] is a regular expression that translates to from the beginning of line match all non-space (or TAB) characters and the first space (or TAB)

// the slashes, and the one above in the first part of the command, s/, are separators. The bit between the first two is the search pattern, the bit between the second two is the replacement (in your case, nothing).

-r tells GNU sed to use enhanced regular expression syntax.

-i tells it to modify the file in place.

Upvotes: 1

Related Questions