Shushiro
Shushiro

Reputation: 582

Linux Bash: Use awk(substr) to get parameters from file input

I have a .txt-file like this:

'SMb_TSS0303'   '171765'    '171864'    '-' 'NC_003078' 'SMb20154'  
'SMb_TSS0302'   '171758'    '171857'    '-' 'NC_003078' 'SMb20154'

I want to extract the following as parameters:

-'SMb'

-'171765'

-'171864'

-'-' (minus)

-> need them without quotes

I am trying to do this in a shell script:

#!/bin/sh
file=$1

cat "$1"|while read line; do
  echo "$line"
  parent=$(awk {'print substr($line,$0,5)'})
  echo "$parent"
done

echos 'SMb

As far as I understood awk substr, I though, it would work like this:

substr(s, a, b)=>returns b number of chars from string s, starting at position a

Firstly, I do not get, why I can extract 'Smb with 0-5, secondly, I can't extract any other parameter I need, because moving the start does not work. E.g. $1,6 gives empty echo. I would expect Mb_TSS

Desired final output:

#!/bin/sh

file=$1

cat "$1"|while read line; do
  parent=$(awk {'print substr($line,$0,5)'})
  start=$(awk{'print subtrs($line,?,?')})
  end=$(awk{'print subtrs($line,?,?')})
  strand=$(awk{'print subtrs($line,?,?')})
done

echo "$parent"    -> echos SMb
echo "$start"     -> echos 171765
echo "$end"       -> echos 171864
echo "$strand"    -> echos -

I have an assumption, that the items in the lines are seen as single strings or something? Maybe I am also handling the file-parsing wrongly, but everything I tried does not work.

Upvotes: 2

Views: 539

Answers (2)

Jean-François Fabre
Jean-François Fabre

Reputation: 140266

the question was orignally tagged python, so let me propose a python solution:

with open("input.txt") as f:
    for l in txt:
        data = [x.strip("'").partition("_")[0] for x in l.split()[:4]]
        print("\n".join(data))

It opens the file, splits the lines like awk would to, considers only the 4 first fields, strips off the quotes, to create the list. Then display it separated by newlines.

that prints:

SMb
171765
171864
-
SMb
171758
171857
-

Upvotes: 1

arco444
arco444

Reputation: 22851

Really unclear exactly what you're trying to do. But I can at least help you with the awk syntax:

while read -r line
do 
    parent=$(echo $line | awk '{print substr($1,2,3)}')
    start=$(echo $line | awk '{print substr($2,2,6)}')
    echo $parent
    echo $start
done < file

This outputs:

SMb
171765
SMb
171758

You should be able to figure out how to get the rest of the fields.

This is quite an inefficient way to do this but based on the information in the question I'm unable to provide a better answer at the moment.

Upvotes: 1

Related Questions