Zoro99
Zoro99

Reputation: 185

Formating http get request output in python

I am trying to read some data from our internal web-page using the following code:

import requests
from requests_toolbelt.utils import dump

resp = requests.get('XXXXXXXXXXXXXXXX')
data = dump.dump_all(resp)
print(data.decode('utf-8'))

And the output I am getting is in following format:

<tr> 
    <td bgcolor="#FFFFFF"><font size=2><a     
href=javascript:openwin(179)>Kevin</a></font></td>
    <td bgcolor="#FFFFFF"><font size=2>45.50/week</font></td>
  </tr>

  <tr> 
    <td bgcolor="#FFFFFF"><font size=2><a  
href=javascript:openwin(33)>Eliza</a></font></td>
    <td bgcolor="#FFFFFF"><font size=2>220=00/week</font></td>
  </tr>

  <tr> 
    <td bgcolor="#FFFFFF"><font size=2><a href=javascript:openwin(97)>sam</a></font></td>
    <td bgcolor="#FFFFFF"><font size=2>181=00</font></td>
  </tr>

However the data I am interested in above output is the name and the values, e.g.:

Kevin 45.50/week
Eliza 220=00/week
Sam 181=00

Is there any module/way I can format this output in required format and put it in some file(preferably Excel)

Upvotes: 0

Views: 99

Answers (1)

Shane
Shane

Reputation: 2391

Try BeautifulSoup:

from bs4 import BeautifulSoup as soup

content = """<tr> 
    <td bgcolor="#FFFFFF"><font size=2><a     
href=javascript:openwin(179)>Kevin</a></font></td>
    <td bgcolor="#FFFFFF"><font size=2>45.50/week</font></td>
  </tr>

  <tr> 
    <td bgcolor="#FFFFFF"><font size=2><a  
href=javascript:openwin(33)>Eliza</a></font></td>
    <td bgcolor="#FFFFFF"><font size=2>220=00/week</font></td>
  </tr>

  <tr> 
    <td bgcolor="#FFFFFF"><font size=2><a href=javascript:openwin(97)>sam</a></font></td>
    <td bgcolor="#FFFFFF"><font size=2>181=00</font></td>
  </tr>"""

html = soup(content, 'lxml')
trs = html.find_all('tr')

for row in trs:
    tds = row.find_all('td')

    for data in tds:
        print data.text.strip(), 

    print '\n'

The output:

Kevin 45.50/week 

Eliza 220=00/week 

sam 181=00 

First find all <tr> tags with find_all('tr'), then all <td> tags inside with find_all('td'), finally output text content of that td with data.text

Upvotes: 1

Related Questions