Reputation: 147
I've a file that contains info that I'm retrieving such way
Command
cat 2018_02_15_09_01_08_result.tsv | grep -o [A-Z]\\*[0-9]*:[0-9]* | sort | uniq | sed -e 's/^/HLA-/' |tr '\n' ',' | sed '$ s/.$//'
Output
HLA-A*30:02,HLA-B*18:01,HLA-C*05:01
But I'm trying to save this in variable, the asterisk and a letter disappears, I've tried several ways, adding/removing commas etc and I'm yet not able to print it properly.
hla=`cat 2018_02_15_09_01_08_result.tsv | grep -o [A-Z]\\*[0-9]*:[0-9]* | sort | uniq | sed -e 's/^/HLA-/' |tr '\n' ',' | sed '$ s/.$//'`
echo $hla
HLA-05:01,HLA-18:01,HLA-30:02
echo "$hla"
HLA-05:01,HLA-18:01,HLA-30:02
Upvotes: 0
Views: 2233
Reputation: 189936
There are multiple errors here, most of which will be aptly diagnosed by http://shellcheck.net/ without any human intervention.
You really should single-quote your regular expressions unless you specifically require the shell to perform wildcard expansion and whitespace tokenization on the regex before executing the command.
The obsolescent `command`
in backticks introduces some unfortunate additional shell handling on the string inside the backticks. The solution since the 1990s is to prefer the $(command)
syntax for command substitution, which does not exhibit this problem.
The cat
is useless; grep
knows full well how to read a file.
Try this refactored code:
hla=$(grep -o '[A-Z]*[0-9]*:[0-9]*' 2018_02_15_09_01_08_result.tsv |
sort -u | sed -e 's/^/HLA-/' |tr '\n' ',' | sed '$ s/.$//')
echo "$hla"
The double quotes around the variable interpolation in the echo
are necessary and useful; notice also the line wraps for legibility and the use of sort -u
in preference over sort | uniq
(and generally try to reduce the number of processes -- once I understand what the sed | tr | sed
does I can probably propose a simplification for that, too). Perhaps the simplest fix would be to refactor all of this into a single Awk script, but without access to the input, it's hard to tell you in more detail what that might look like.
(Also, are you really sure you need to capture the value to a variable? Often variable=value; echo "$variable"
is just an obscure and inefficient way to say echo "value"
. And variable=$(command); echo "$variable"
is better written simply command
and capturing the command's standard output just so you can print it to standard output is a pure waste of cycles, unless you are planning to do something more with that variable's value.)
Upvotes: 2
Reputation: 147
I've solved it by saving the output of the command with a redirection:
cat 2018_02_15_09_01_08_result.tsv |
grep -o [A-Z]\\*[0-9]*:[0-9]* |
sort | uniq |
sed -e 's/^/HLA-/' |tr '\n' ',' | sed '$ s/.$//' > out_file
hla=`cat out_file`
echo $hla
which gets me the expected HLA-A*30:02,HLA-B*18:01,HLA-C*05:01
. Not the ideal solution, but it works.
Upvotes: -1