Parse HTML snippet with awk

Question

I am trying to parse an HTML document with awk.

The document contains several

 blocks

 
    
    287,489 people
  
  
    
    5 links
  


I am using 

awk  '/
/,/<\/div>/'

to receive all such div's. 

How I can get 287,489 number from first one?

Actually awk  '/<\/span>/,/people/' doesn't work correctly.

iruvar · Accepted Answer

With gawk, and assuming that the only digits and commas within each

block occur in the numeric portion of interest

awk -v RS='<[/]?div[^>]*>' '/span/ && /people/{gsub(/[^[:digit:],]/, ""); print}' file.txt

Parse HTML snippet with awk

Answers (1)

Related Questions