Reputation: 3
I have a very large file of tab-separated columns (225,000 rows by 16 columns) from which I would like to extract a substring from each row and create a separate file. For example, here is a line of characters and I'd like to extract the numbers 516315992 in between gi| and | and do that for all 225,000 rows.
This is one row from the table as the example:
M01522:132:000000000-A4LNU:1:2114:14381:3858 gi|516315992|ref|WP_017712686.1| 317 153 19 74 2e-09 60.1 53.57 56 25 1 1223 N/A N/A hypothetical protein [Prochlorothrix hollandica]
I would like to extract the number 516315992 from that text string and make a table (one column, n rows) into a separate file.
I am a REAL newbie but willing to put in the time it takes.
I appreciate your help.
Take care
Raul
Upvotes: 0
Views: 130
Reputation: 41460
If data in the file are all in same format, this simple awk
should do:
awk -F\| '{print $2}' org_file > new_file
It's split the file by |
and print the second column.
Upvotes: 0
Reputation: 58473
This might work for you (GNU sed):
sed 's/^[^|]*|\([^|]*\)|.*/\1/' file
Upvotes: 0
Reputation: 2672
You can use this oneliner:
perl -F"\|" -a -ne 'print "$F[1]\n"' input.txt > result.txt
Notice that -F
takes as a parameter the pattern to split the input. Then the array @F
inside the oneliner contains the fields that resulted after the split.
The output is redirected to result.txt
See perlrun for more details for Perl command-line switches.
Upvotes: 1