Reputation: 3

Extracting a text sub-string from tab-separated table

I have a very large file of tab-separated columns (225,000 rows by 16 columns) from which I would like to extract a substring from each row and create a separate file. For example, here is a line of characters and I'd like to extract the numbers 516315992 in between gi| and | and do that for all 225,000 rows.

This is one row from the table as the example:

M01522:132:000000000-A4LNU:1:2114:14381:3858    gi|516315992|ref|WP_017712686.1|    317 153 19  74  2e-09   60.1    53.57   56  25  1   1223    N/A N/A hypothetical protein [Prochlorothrix hollandica]

I would like to extract the number 516315992 from that text string and make a table (one column, n rows) into a separate file.

I am a REAL newbie but willing to put in the time it takes.

I appreciate your help.

Take care

Raul

Upvotes: 0

Answers (4)

Jotne

Reputation: 41460

If data in the file are all in same format, this simple awk should do:

awk -F\| '{print $2}' org_file > new_file

It's split the file by | and print the second column.

Upvotes: 0

potong

Reputation: 58473

This might work for you (GNU sed):

 sed 's/^[^|]*|\([^|]*\)|.*/\1/' file

Upvotes: 0

wsdookadr

Reputation: 2672

You can use this oneliner:

perl -F"\|" -a -ne 'print "$F[1]\n"' input.txt > result.txt

Notice that -F takes as a parameter the pattern to split the input. Then the array @F inside the oneliner contains the fields that resulted after the split.

The output is redirected to result.txt

See perlrun for more details for Perl command-line switches.

Upvotes: 1

Miller

Reputation: 35208

In a one liner:

perl -ne 'print "$1\n" if /gi\|([^|]*)/' file.txt

Upvotes: 0

Extracting a text sub-string from tab-separated table

Answers (4)

Related Questions