A. Palmer
A. Palmer

Reputation: 149

Utilising variables in tail command

I am trying to export characters from a reference file in which their byte position is known. To do this, I have a long list of numbers stored as a variable which have been used as the input to a tail command.

For example, the reference file looks like:

ggaaatgcattcaaacatgc

And the list looks like:

5
10
7
15

I have tried using this code:

list=$(<pos.txt)
echo "$list"
cat ref.txt | tail -c +"list" | head -c1 > out.txt

However, it keeps returning "invalid number of bytes: '+5\n10\n7\n15...'"

My expected output would be

a
t
g
a
... 

Can anybody tell me what I'm doing wrong? Thanks!

Upvotes: 2

Views: 1925

Answers (3)

oguz ismail
oguz ismail

Reputation: 50815

You could use cut instead of tail:

pos=$(<pos.txt)
cut -c ${pos//$'\n'/,} --output-delimiter=$'\n' ref.txt

Or just awk:

awk -F '' 'NR==FNR{c[$0];next} {for(i in c) print $i}' pos.txt ref.txt

both yield:

a
g
t
a

Upvotes: 1

Inian
Inian

Reputation: 85895

The reason for your command failure is simple. The variable list contains a multi-line string stored from the pos.txt files including newlines. You cannot pass not more than one integer value for the -c flag.

Your attempts can be fixed quite easily with removing calls to cat and using a temporary variable to hold the file content

while IFS= read -r lineNo; do
    tail -c "$lineNo" ref.txt | head -c1
done < pos.txt

But then if your intentions is print the desired output in a new-line every time, head does not output that way. It just forms a string atga for your given input in a single line and not across multiple lines with one character at each line.

As Gordon mentions in one of the comments, for much more efficient FASTA files processing, you could just use one invocation of awk though (skipping multiple forks to head/tail). Your provided input does not involve any headers to skip which would be straightforward as

awk ' FNR==NR{ n = split($0,arr,""); for(i=1;i<=n;i++) hash[i] = arr[i] } 
      ( $0 in hash ){ print hash[$0] } ' ref.txt pos.txt

Upvotes: 2

dozerman
dozerman

Reputation: 307

It looks like you are trying to access your list variable in your tail command. You can access it like this: $list rather than just using quotes around it.

Your logic is flawed even after fixing the variable access. The list variable includes all lines of your list.txt file. Including the newline character \n which is invisible in many UIs and programs, but it is of course visible when you are manually reading single bytes. You need to feed the lines one by one to make it work properly.

Also unless those numbers are indexes from the end, you need to feed them to head instead of tail.

If I understood what you are attempting to do correctly, this should work:

while read line
do
  head -c $line ref.txt | tail -c 1 >> out.txt
done < pos.txt

Upvotes: 3

Related Questions