Reputation: 582
I have a .txt-file like this:
'SMb_TSS0303' '171765' '171864' '-' 'NC_003078' 'SMb20154'
'SMb_TSS0302' '171758' '171857' '-' 'NC_003078' 'SMb20154'
I want to extract the following as parameters:
-'SMb'
-'171765'
-'171864'
-'-' (minus)
-> need them without quotes
I am trying to do this in a shell script:
#!/bin/sh
file=$1
cat "$1"|while read line; do
echo "$line"
parent=$(awk {'print substr($line,$0,5)'})
echo "$parent"
done
echos 'SMb
As far as I understood awk substr, I though, it would work like this:
substr(s, a, b)=>returns b number of chars from string s, starting at position a
Firstly, I do not get, why I can extract 'Smb with 0-5, secondly, I can't extract any other parameter I need, because moving the start does not work. E.g. $1,6 gives empty echo. I would expect Mb_TSS
#!/bin/sh
file=$1
cat "$1"|while read line; do
parent=$(awk {'print substr($line,$0,5)'})
start=$(awk{'print subtrs($line,?,?')})
end=$(awk{'print subtrs($line,?,?')})
strand=$(awk{'print subtrs($line,?,?')})
done
echo "$parent" -> echos SMb
echo "$start" -> echos 171765
echo "$end" -> echos 171864
echo "$strand" -> echos -
I have an assumption, that the items in the lines are seen as single strings or something? Maybe I am also handling the file-parsing wrongly, but everything I tried does not work.
Upvotes: 2
Views: 539
Reputation: 140266
the question was orignally tagged python, so let me propose a python solution:
with open("input.txt") as f:
for l in txt:
data = [x.strip("'").partition("_")[0] for x in l.split()[:4]]
print("\n".join(data))
It opens the file, splits the lines like awk
would to, considers only the 4 first fields, strips off the quotes, to create the list. Then display it separated by newlines.
that prints:
SMb
171765
171864
-
SMb
171758
171857
-
Upvotes: 1
Reputation: 22851
Really unclear exactly what you're trying to do. But I can at least help you with the awk
syntax:
while read -r line
do
parent=$(echo $line | awk '{print substr($1,2,3)}')
start=$(echo $line | awk '{print substr($2,2,6)}')
echo $parent
echo $start
done < file
This outputs:
SMb
171765
SMb
171758
You should be able to figure out how to get the rest of the fields.
This is quite an inefficient way to do this but based on the information in the question I'm unable to provide a better answer at the moment.
Upvotes: 1