user2466087
user2466087

Reputation: 27

GREP data within multiple tags from cURL html

Getting rather desperate to understand how to get the data I want from a curl command.

I need a hand with generating a grep command to get the following html:

<title> timetable </t itle>< <h3>study table</h3> <p>< strong>biology <div> <table
width='100%' cellpadding='5' cellspacing='0'><tr><th colspan="3">Level 44 Building 1 <tr> 
<td >monday</td> <td >1:30 – 2:30</td> <td >< a>Room number 22</a></td> <td >&nbsp;</td>
</tr> <tr><th colspan="2">body> </html>

I would like the output look like:

timetable
study table
Biology
Level 44 Building 1
Monday
1:30 - 2:30 
Room Number 22

Currently I only know how to do a single grep such as :

grep 'href='

Upvotes: 0

Views: 879

Answers (2)

Chris Seymour
Chris Seymour

Reputation: 85785

If you have GNU grep:

$ grep -Po '(?<=>) ?\K[^<&>]{2,}(?=<)' file
timetable 
study table
biology 
Level 44 Building 1 
monday
1:30 – 2:30
Room number 22

Disclaimer: You should really use a proper parser for this.

Upvotes: 1

cforbish
cforbish

Reputation: 8819

Assuming your string is in the variable $data, you can:

IFS=$'\n'
result=$(echo $data | sed 's/&[^;]*;//')
result=$(echo $result | sed 's/<[^>]*>/\n/g')
for string in $result; do
    if [[ ! $string =~ ^\ *$ ]]; then
        echo "string=$string."
    fi
done

Upvotes: 0

Related Questions