HeyHoLetsGo
HeyHoLetsGo

Reputation: 147

Asterisk in bash variable

I've a file that contains info that I'm retrieving such way

Command

cat 2018_02_15_09_01_08_result.tsv | grep -o [A-Z]\\*[0-9]*:[0-9]* | sort | uniq | sed -e 's/^/HLA-/'  |tr '\n' ',' | sed '$ s/.$//'

Output

HLA-A*30:02,HLA-B*18:01,HLA-C*05:01

But I'm trying to save this in variable, the asterisk and a letter disappears, I've tried several ways, adding/removing commas etc and I'm yet not able to print it properly.

hla=`cat 2018_02_15_09_01_08_result.tsv | grep -o [A-Z]\\*[0-9]*:[0-9]* | sort | uniq | sed -e 's/^/HLA-/'  |tr '\n' ',' | sed '$ s/.$//'`

echo $hla
HLA-05:01,HLA-18:01,HLA-30:02
echo "$hla"
HLA-05:01,HLA-18:01,HLA-30:02

Upvotes: 0

Views: 2233

Answers (2)

tripleee
tripleee

Reputation: 189936

There are multiple errors here, most of which will be aptly diagnosed by http://shellcheck.net/ without any human intervention.

  • You really should single-quote your regular expressions unless you specifically require the shell to perform wildcard expansion and whitespace tokenization on the regex before executing the command.

  • The obsolescent `command` in backticks introduces some unfortunate additional shell handling on the string inside the backticks. The solution since the 1990s is to prefer the $(command) syntax for command substitution, which does not exhibit this problem.

  • The cat is useless; grep knows full well how to read a file.

Try this refactored code:

hla=$(grep -o '[A-Z]*[0-9]*:[0-9]*' 2018_02_15_09_01_08_result.tsv |
  sort -u | sed -e 's/^/HLA-/'  |tr '\n' ',' | sed '$ s/.$//')
echo "$hla"

The double quotes around the variable interpolation in the echo are necessary and useful; notice also the line wraps for legibility and the use of sort -u in preference over sort | uniq (and generally try to reduce the number of processes -- once I understand what the sed | tr | sed does I can probably propose a simplification for that, too). Perhaps the simplest fix would be to refactor all of this into a single Awk script, but without access to the input, it's hard to tell you in more detail what that might look like.

(Also, are you really sure you need to capture the value to a variable? Often variable=value; echo "$variable" is just an obscure and inefficient way to say echo "value". And variable=$(command); echo "$variable" is better written simply command and capturing the command's standard output just so you can print it to standard output is a pure waste of cycles, unless you are planning to do something more with that variable's value.)

Upvotes: 2

HeyHoLetsGo
HeyHoLetsGo

Reputation: 147

I've solved it by saving the output of the command with a redirection:

cat 2018_02_15_09_01_08_result.tsv |
grep -o [A-Z]\\*[0-9]*:[0-9]* |
sort | uniq |
sed -e 's/^/HLA-/'  |tr '\n' ',' | sed '$ s/.$//' > out_file
hla=`cat out_file`
echo $hla

which gets me the expected HLA-A*30:02,HLA-B*18:01,HLA-C*05:01. Not the ideal solution, but it works.

Upvotes: -1

Related Questions