Reputation:
I am trying to extract some data from a site using beautiful soup, specifically a table in which the table and rows are stored in div tags rather than the usual table tag. This means i cannot use the pandas read_html
function to simply extract all the tables .
Here is the html i extracted
<div class="block">
<div class="expand">
<div class="expand-button collapsed" data-toggle="collapse">Forex</div>
<div class="panel-collapse collapse">
<div class="table">
<div class="search">
<div class="date"</div>
<div class="group search">
<span>Search </span>
<input class="search-box" type="search"/>
</div>
<div class="group ">
<span class="label"></span>
<span class="toggle-a"> </span>
<span class="toggle-b"> </span>
</div>
</div>
<div class="skin">
<div class="table visible">
<div class="header">
<div>Product</div>
<div>Account A</div>
</div>
<div class="column-header">
<div class="column-name">NAME</div>
<div class="column-name">DESCRIPTION</div>
<div class="column-name">Value1</div>
<div class="column-name">Value2</div>
<div class="column-name">Value3</div>
<div class="column-name">Value3</div>
</div>
<div class="table-row">
<div class="table-cell c1">bronze</div>
<div class="table-cell c2">3rd tier</div>
<div class="table-cell c3">0</div>
<div class="table-cell c4">1</div>
<div class="table-cell c5">1</div>
<div class="table-cell c6">1</div>
<div class="table-cell c-true">Account A</div>
<div class="table-cell c-standard">Account B</div>
</div>
<div class="table-row">
<div class="table-cell c1">silver</div>
<div class="table-cell c2">2nd tier</div>
<div class="table-cell c3">1</div>
<div class="table-cell c4">0</div>
<div class="table-cell c5">3</div>
<div class="table-cell c6">0</div>
<div class="table-cell c-true">Account A</div>
<div class="table-cell c-standard">Account B</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
and what i want at the end:
| Product | | Account A | | Account B | |
|---------|-------------|-----------|---------|-----------|---------|
| NAME | DESCRIPTION | Value 1 | Value 2 | Value 3 | Value 4 |
| bronze | 3rd tier | 0 | 1 | 1 | 1 |
| silver | 2nd tier | 1 | 0 | 3 | 0 |
Is there a simple way using python or beautiful soup to do this?
Upvotes: 2
Views: 2337
Reputation: 3400
Code to generate data from given html tags here i have parse your data as html
from bs4 import BeautifulSoup
rows=[]
soup=BeautifulSoup(html,"html.parser")
first_row=soup.find("div",attrs={"class":"column-header"}).text.strip("\n").split("\n")
for i in range(len((soup.select("div[class=table-row]")))):
rows.append(soup.select("div[class=table-row]")[i].text.strip("\n").split("\n")[:6])
for Table Generation you can install BeautifulTable
from beautifultable import BeautifulTable
table = BeautifulTable()
table.column_headers = ["Product", "","Account A","","Account B",""]
table.append_row(first_row)
for i in rows:
table.append_row(i)
print(table)
Output:
+---------+-------------+-----------+--------+-----------+--------+
| Product | | Account A | | Account B | |
+---------+-------------+-----------+--------+-----------+--------+
| NAME | DESCRIPTION | Value1 | Value2 | Value3 | Value4 |
+---------+-------------+-----------+--------+-----------+--------+
| bronze | 3rd tier | 0 | 1 | 1 | 1 |
+---------+-------------+-----------+--------+-----------+--------+
| silver | 2nd tier | 1 | 0 | 3 | 0 |
+---------+-------------+-----------+--------+-----------+--------+
you can still modify tabular looking data by using tabulate
library
Upvotes: 4