Reputation: 245
I have this piece of code:
for t in tables:
print ""
my_table = t
rows = my_table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
i = 0
for td in cols:
text = str(td.text).strip()
print "{}{}".format(text if text !="" else "IP","|"),
i=i+1
if i == 2:
print ""
i = 0
pass
"tables" is is a list of tables in HTML format. I am using beautifulsoup to parse in them.
Currently, the output that I get is:
Interface in| port-channel8.53|
IP| 172.18.153.126/255.255.255.252|
Router| bob|
Route| route: 192.168.178.0/255.255.128.0 gw 172.18.145.106|
Interface out| Ethernet2/5.103|
IP| 172.18.145.105/255.255.255.252|
What I want to get is:
Interface in | port-channel8.53 |
IP | 172.18.153.126/255.255.255.252 |
Router | bob |
Route | route: 192.168.178.0/255.255.128.0 gw 172.18.145.106|
Interface out| Ethernet2/5.103 |
IP | 172.18.145.105/255.255.255.252 |
"Placeholder"| another ip in the same td as the one up |
"Placeholder"| another ip in the same td as the one up |
How can I get this output?
EDIT:
Here is how 1 table is made:
<table>
<tr>
<td>Interface in</td>
<td>Vlan800 (bob)</td>
</tr>
<tr>
<td></td>
<td>172.26.128.3/255.255.255.224<br></br></td>
</tr>
<tr>
<td>Router</td>
<td>bob2</td>
</tr>
<tr>
<td>Route</td>
<td>route: 0.0.0.0/0.0.0.0 gw 172.26.144.241</td>
</tr>
<tr>
<td>Interface out</td>
<td>Vlan1145 (bob3)</td>
</tr>
<tr>
<td></td>
<td>172.26.144.245/255.255.255.240<br></br></td>
</tr>
</table>
(Yes, the empty are on the real page)
EDIT2: Problematic code:
<td>
195.233.112.4/255.255.255.0<br>
195.233.112.15/255.255.255.0<br>
195.233.112.3/255.255.255.0<br>
<br><br><br></td>
EDIT 3:
Sample code 2 (tha creates problems with solutions proposed)
<table class="nitrestable">
<tr>
<td>Interface in</td>
<td>GigabitEthernet1/1.103 (*global)</td>
</tr>
<tr>
<td></td>
<td>172.18.145.106/255.255.255.252<br></br></td>
</tr>
<tr>
<td>Router</td>
<td>*grt</td>
</tr>
<tr>
<td>Route</td>
<td>route: 172.18.145.106/255.255.255.128 gw 172.18.145.106</td>
</tr>
<tr>
<td>Interface out</td>
<td>Vlan71 (*global)</td>
</tr>
<tr>
<td></td>
<td>172.18.145.106/255.255.255.0<br>
172.18.146.106/255.255.255.0<br>
172.18.147.106/255.255.255.0<br></br></br></br></td></tr>
</table>
Upvotes: 1
Views: 1595
Reputation: 1285
It helps parsing the rows/cols into a list and then evaluating them. That makes it simple to compute the maximum widths of the columns (w1, w2 in the code). As the others said, once that width has been determined str.format() is what you want.
for t in tables:
col = [[],[]]
my_table = t
rows = my_table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
i = 0
for td in cols:
text = str(td.text).strip()
col[i].append(text if text else "IP")
i=i+1
if i == 2:
if '<br>' in text:
text = text.replace('</br>','') #ignore </br>
for t in text.split('<br>')[1:]: #first element has already been processed
if t: #only append if there is content
col[0].append(col[0][-1]) #duplicate the last entry of col[0]
col[1].append(t)
i = 0
w1 = max([len(x) for x in col[0]])
w2 = max([len(x) for x in col[1]])
for i in range(len(col[1]))
s='{: <{}}|{: <{}}|'.format(col[0][i],w1,col[1][i],w2)
print(s)
To explain the str.format(): '{: <{}}'.format(x,y)
creates a whitespace-padded left adjusted string with the width y
from the text x
.
edit: added the additional parsing of multiple IPs/any fields were the second colum is separated with <br>
Upvotes: 0
Reputation: 3098
This is a 'simpler' script. Look up the enumerate
keyword in Python.
import BeautifulSoup
raw_str = \
'''
<table>
<tr>
<td>Interface in</td>
<td>Vlan800 (bob)</td>
</tr>
<tr>
<td></td>
<td>172.26.128.3/255.255.255.224<br></br></td>
</tr>
<tr>
<td>Router</td>
<td>bob2</td>
</tr>
<tr>
<td>Route</td>
<td>route: 0.0.0.0/0.0.0.0 gw 172.26.144.241</td>
</tr>
<tr>
<td>Interface out</td>
<td>Vlan1145 (bob3)</td>
</tr>
<tr>
<td></td>
<td>172.26.144.245/255.255.255.240<br></br></td>
</tr>
</table>
'''
org_str = \
'''
Interface in| port-channel8.53|
IP| 172.18.153.126/255.255.255.252|
Router| bob|
Route| route: 192.168.178.0/255.255.128.0 gw 172.18.145.106|
Interface out| Ethernet2/5.103|
IP| 172.18.145.105/255.255.255.252|
'''
print org_str
soup = BeautifulSoup.BeautifulSoup(raw_str)
tables = soup.findAll('table')
for cur_table in tables:
print ""
col_sizes = {}
# Figure out the column sizes
for tr in cur_table.findAll('tr'):
tds = tr.findAll('td')
cur_col_sizes = {col : max(len(td.text), col_sizes.get(col, 0)) for (col, td) in enumerate(tds)}
col_sizes.update(cur_col_sizes)
# Print the data, padded using the detected column sizes
for tr in cur_table.findAll('tr'):
tds = tr.findAll('td')
line_strs = [("%%-%ds" % col_sizes[col]) % (td.text or "IP") for (col, td) in enumerate(tds)]
line_str = "| %s |" % " | ".join(line_strs)
print line_str
Upvotes: 0
Reputation: 122097
You can supply a format specifier, e.g.
print "{0:14}|".format(text or "IP"),
or pad the string you're passing to format
with str.ljust
:
print "{}|".format(str.ljust(text or "IP", 14)),
However, (as dilbert has just pointed out in the comments), you will need to do something to work out the size you require for each column.
Note that, as the empty string ""
evaluates False
in a boolean context, you can simplify your if
condition, and as the pipe '|'
never changes you can put it in the template directly.
Upvotes: 1