nvachhan
nvachhan

Reputation: 93

Writing to csv after parsing with Beautifulsoup results in separated values or an empty output file

<table class="table_grid">
    <thead>
        <tr>
            <th>Name</th>
            <th>User Name</th>
            <th>Role</th>
            <th>Branch</th>
            <th>Actions</th>

        </tr>
    </thead>
    <tbody>

                <tr>
                    <td>First Name1</td>
                    <td>[email protected]</td>
                    <td>Processor</td>

                    <td></td>   

                                <td><a href="/Account/EditUser?id=4c4e6455-7d27-4abf-93c9-5584f09674d5">Edit</a></td>

                </tr>

                <tr>
                    <td>First Name2</td>
                    <td>[email protected]</td>
                    <td>Officer</td>

                    <td></td>   

                                <td><a href="/Account/EditUser?id=267e90eb-6fa4-4286-88d9-738913cdd7ea">Edit</a></td>

                </tr>

    </tbody>
</table>

I am trying to parse the text from this table and write it to a csv file. It writes to csv but every letter ends up in a new column. |F|i|r|s|t| when I am looking for |First|.

soup = BeautifulSoup(browser.page_source, 'html.parser')

table = soup.find('table', attrs={'class':'table_grid'})

with open('test1.csv', 'w', newline='') as outfile:
    writer = csv.writer(outfile)
    for body in table.findAll('tr'):
        rows = body.getText()
        writer.writerow(rows)

This is my code. Looking at similar issues on here I tried fixing this issue with the following:

writer.writerow([rows])

However this resulted in a blank csv file. Any idea what I am doing wrong here?

Upvotes: 1

Views: 230

Answers (1)

alecxe
alecxe

Reputation: 474241

I think you meant to write every cell into it's own column:

with open('test1.csv', 'w', newline='') as outfile:
    writer = csv.writer(outfile)
    for row in table('tr'):
        writer.writerow([cell.get_text(strip=True) for cell in row(['td', 'th'])])

Note that I'm using some shortcuts here - table('tr') is an alternative concise way to do table.find_all('tr').

Also, an alternative way to dump the HTML table into CSV would be to use pandas library, in particular - .read_html() and .to_csv() methods.

Upvotes: 1

Related Questions