Enric Agud Pique
Enric Agud Pique

Reputation: 1105

create an array of names from a file

I would like to create an array using bash commands of names. I have a file with this form,

<span class="username js-action-profile-name">@paulburt07</span>
<span class="username js-action-profile-name">@DavidWBrown7</span>
<span class="username js-action-profile-name">@MikeLarkan</span>
<span class="username js-action-profile-name">@WeathermanABC</span>
<span class="username js-action-profile-name">@JoshHoltTEN</span>
<span class="username js-action-profile-name">@TonyAuden</span>
<span class="username js-action-profile-name">@Magdalena_Roze</span>
<span class="username js-action-profile-name">@janesweather7</span>
<span class="username js-action-profile-name">@VanessaOHanlon</span>

And I need an array like

array = ( "paulburt07" "DavidWBrown7" "MikeLarkan" "WeathermanABC" "JoshHolTEN" "TonyAuden" "Magdalena_Roze" "janesweahter7" "VansessaOHanlon" )

Any idea?

Upvotes: 0

Views: 66

Answers (2)

clt60
clt60

Reputation: 63912

One of many possible solutions:

array=($(grep -oP '@\K(.*)(?=<)' file))

EDIT: Not much to explanation, the grep searches the file for the pattern defined by the regular expression. (see man grep). The -o prints only the matches, the -P says, use perl-ish regexes.

The @\K(.*)(?=<) mean:

  • search and match @
  • forget the match \K, (but remember the position)
  • match ant string (.*)
  • until found <

the $(command) called as command substitution, and array=(...) assingns the values to the array.

EDIT2 And because you original input probably contains more HTML tags, you can employ HTML parser, for example:

array=($(perl -Mojo -E 'say $_->text for x(b("filename.html")->slurp)->find(q{span[class~="username"]})->each'))

will print the content of any <span class=username>...</span> in any HTML, regardless it's formatting. But for the above you need to have installed Mojolicious.

Upvotes: 2

David C. Rankin
David C. Rankin

Reputation: 84561

It is fairly simple using sed and a tmp file:

#!/bin/bash

fname=${1:-htmlnames.txt}           # original html file
tmp=${2:-htmltmp.txt}               # temp file to use

sed -e 's/.*@//' "$fname" > "$tmp"  # remove up to '@' and place in temp
sed -i 's/[<].*$//' "$tmp"          # remove remainder in place in temp
namearray=( $(<"$tmp") )            # read temp file into array
rm "$tmp"                           # remove temp file

for i in "${namearray[@]}"; do        # print out to verify
    printf " %s\n" "$i"
done

exit 0

output:

alchemy:~/scr/tmp/stack/tmp> bash htmlnames.sh
 paulburt07
 DavidWBrown7
 MikeLarkan
 WeathermanABC
 JoshHoltTEN
 TonyAuden
 Magdalena_Roze
 janesweather7
 VanessaOHanlon

Upvotes: 1

Related Questions