Extract text from HTML based on table column via Shell Script

Question

I need to write a shell script that reads an html file sample.html and extracts data from a table column, based on another table column. For example, this is the HTML code:


  
    
      
      core6690.myserverdomain.com 
    
    
      
      admin
    
  
  
    
      
      core6691.myserverdomain.com 
    
    
      
      secondary 
    
  
  
    
      
      core6692.myserverdomain.com 
    
    
      
      primary

Let's say that I want to determine what the URL for "admin" then the result would be core6690.myserverdomain.com; if I my input is "primary" then the output would be "core6692.myserverdomain.com" and so on...

The HTML page has a lot more data, header tags, footer stuff, etc., but the important stuff that I am looking for is placed inside a table with the exact same structure I list in the code... except it has many more rows, not necessarily just 3 as in this example.

I have seen related answers in this site that seg, grep, regular expressions, awk, and other tools however none of them are close enough to what I am looking for... plus I do not have much experience with any of the approaches as to modify and make them fit my needs.

Any suggestions? Thanks in advance.

servn · Accepted Answer

#/bin/bash

for i in `cat sample.html | grep '<\/div>' | sed 's/\s\+//'|sed 's/<.*>//'`; do
    if [ $i == $1 ];
    then
        echo $prev
    fi
    prev=$i
done

Example of using

$ ./filter.sh primary
core6692.myserverdomain.com

P.s: format of the sample.html should be exacly you posted here, server and the name shouldends with tag and starts with whitespace or tab.

Extract text from HTML based on table column via Shell Script

Answers (2)

Related Questions