user15915597
user15915597

Reputation:

Extract table data from html where rows are stored in divs using python

I am trying to extract some data from a site using beautiful soup, specifically a table in which the table and rows are stored in div tags rather than the usual table tag. This means i cannot use the pandas read_html function to simply extract all the tables .

Here is the html i extracted

<div class="block">
<div class="expand">
<div class="expand-button collapsed" data-toggle="collapse">Forex</div>
<div class="panel-collapse collapse">
<div class="table">
<div class="search">
<div class="date"</div>
<div class="group search">
<span>Search </span>
<input class="search-box" type="search"/>
</div>
<div class="group ">
<span class="label"></span>
<span class="toggle-a"> </span>
<span class="toggle-b"> </span>
</div>
</div>
<div class="skin">
<div class="table visible">
<div class="header">
<div>Product</div>
<div>Account A</div>
</div>
<div class="column-header">
<div class="column-name">NAME</div>
<div class="column-name">DESCRIPTION</div>
<div class="column-name">Value1</div>
<div class="column-name">Value2</div>
<div class="column-name">Value3</div>
<div class="column-name">Value3</div>
</div>
<div class="table-row">
<div class="table-cell c1">bronze</div>
<div class="table-cell c2">3rd tier</div>
<div class="table-cell c3">0</div>
<div class="table-cell c4">1</div>
<div class="table-cell c5">1</div>
<div class="table-cell c6">1</div>
<div class="table-cell c-true">Account A</div>
<div class="table-cell c-standard">Account B</div>
</div>
<div class="table-row">
<div class="table-cell c1">silver</div>
<div class="table-cell c2">2nd tier</div>
<div class="table-cell c3">1</div>
<div class="table-cell c4">0</div>
<div class="table-cell c5">3</div>
<div class="table-cell c6">0</div>
<div class="table-cell c-true">Account A</div>
<div class="table-cell c-standard">Account B</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>

and what i want at the end:

| Product |             | Account A |         | Account B |         |
|---------|-------------|-----------|---------|-----------|---------|
| NAME    | DESCRIPTION | Value 1   | Value 2 | Value 3   | Value 4 |
| bronze  | 3rd tier    | 0         | 1       | 1         | 1       |
| silver  | 2nd tier    | 1         | 0       | 3         | 0       |

Is there a simple way using python or beautiful soup to do this?

Upvotes: 2

Views: 2337

Answers (1)

Bhavya Parikh
Bhavya Parikh

Reputation: 3400

Code to generate data from given html tags here i have parse your data as html

from bs4 import BeautifulSoup
rows=[]
soup=BeautifulSoup(html,"html.parser")
first_row=soup.find("div",attrs={"class":"column-header"}).text.strip("\n").split("\n")
for i in range(len((soup.select("div[class=table-row]")))):
    rows.append(soup.select("div[class=table-row]")[i].text.strip("\n").split("\n")[:6])

for Table Generation you can install BeautifulTable

from beautifultable import BeautifulTable
table = BeautifulTable()
table.column_headers = ["Product", "","Account A","","Account B",""]
table.append_row(first_row)
for i in rows:
    table.append_row(i)
print(table)

Output:

+---------+-------------+-----------+--------+-----------+--------+
| Product |             | Account A |        | Account B |        |
+---------+-------------+-----------+--------+-----------+--------+
|  NAME   | DESCRIPTION |  Value1   | Value2 |  Value3   | Value4 |
+---------+-------------+-----------+--------+-----------+--------+
| bronze  |  3rd tier   |     0     |   1    |     1     |   1    |
+---------+-------------+-----------+--------+-----------+--------+
| silver  |  2nd tier   |     1     |   0    |     3     |   0    |
+---------+-------------+-----------+--------+-----------+--------+

you can still modify tabular looking data by using tabulate library

Upvotes: 4

Related Questions