X3nion
X3nion

Reputation: 301

extract certain string from variable

I've got a text file containing the html-source of a web page. There are lines with "data-adid="...". These lines I'd like to capture. Therefore, I use:

Id=$(grep -m 10 -A 1 "data-adid" Textfile)

to get the first ten results. The variable Id contains the following:

<arcicle class="aditem" data-adid="1234567890" <div class="aditem-image"> -- 
<arcicle class="aditem" data-adid="2134567890" <div class="aditem-image"> --
<arcicle class="aditem" data-adid="2134567890" <div class="aditem-image"> --
...

I would like to get the following output:

id="1234567890" id="2134567890" id="3124567890"

When using the grep command, I only managage to get the numbers, e.g.

Id2=$(echo $Id | grep -oP '(?<=data-ad=").*?(?=")')

gets 1234567890 2134567890 3124567890

When trying

Id2=$(echo $Id | grep -oP '(?<=data-ad).*?(?=")')

this will only give me id= id= id=

How could the code be change to get the desired output?

Upvotes: 1

Views: 117

Answers (3)

Ed Morton
Ed Morton

Reputation: 204558

With any sed:

$ sed 's/.*data-ad\(id="[^"]*"\).*/\1/' file
id="1234567890"
id="2134567890"
id="2134567890"

Upvotes: 0

RavinderSingh13
RavinderSingh13

Reputation: 133760

Though html values should be dealt with tools which understand html well but since OP is mentioning he/she needs in shell like tools, I would go for awk for this one. Written and tested in https://ideone.com/EpU1aW

echo "$var" |
awk '
match($0,/data-adid="[^"]*"/){
  val=substr($0,RSTART,RLENGTH)
  sub(/^data-ad/,"",val)
  print val
  val=""
}
'

Upvotes: 2

KamilCuk
KamilCuk

Reputation: 141890

data-ad is matching only data-ad - actually match the id= part too, with a " up until the next ". And I see no reason to use fancy lookarounds - just match the string and output the matched part only.

grep -oP 'data-ad\Kid="[^"]*"'

Should be enough. Note that $Id undergoes word splitting expansion and most probably should be quoted and that it's impossible to parse html using regex so you should most probably use html syntax aware tools instead.

Upvotes: 2

Related Questions