Nate Walker
Nate Walker

Reputation: 217

Write Headers Once in Python CSV Writer Loop

Below is a scraper that loops through two websites, scrapes a team's roster information, puts the information into an array, and exports the arrays into a CSV file. Everything works great, but the only problem is the writerow headers repeat in the csv file every time the scraper moves on to the second website. Is it possible to adjust the CSV portion of the code to have the headers only appear once when the scraper is looping through multiple websites? Thanks in advance!

import requests
import csv
from bs4 import BeautifulSoup

team_list={'yankees','redsox'}

for team in team_list:
    page = requests.get('http://m.{}.mlb.com/roster/'.format(team))
    soup = BeautifulSoup(page.text, 'html.parser')

    soup.find(class_='nav-tabset-container').decompose()
    soup.find(class_='column secondary span-5 right').decompose()

    roster = soup.find(class_='layout layout-roster')
    names = [n.contents[0] for n in roster.find_all('a')]
    ids = [n['href'].split('/')[2] for n in roster.find_all('a')]
    number = [n.contents[0] for n in roster.find_all('td', index='0')]
    handedness = [n.contents[0] for n in roster.find_all('td', index='3')]
    height = [n.contents[0] for n in roster.find_all('td', index='4')]
    weight = [n.contents[0] for n in roster.find_all('td', index='5')]
    DOB = [n.contents[0] for n in roster.find_all('td', index='6')]
    team = [soup.find('meta',property='og:site_name')['content']] * len(names)

    with open('MLB_Active_Roster.csv', 'a', newline='') as fp:
        f = csv.writer(fp)
        f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])
        f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))

Upvotes: 4

Views: 5276

Answers (3)

RoadRunner
RoadRunner

Reputation: 26315

Just write the header before the loop, and have the loop within the with context manager:

import requests
import csv
from bs4 import BeautifulSoup

team_list = {'yankees', 'redsox'}

headers = ['Name', 'ID', 'Number', 'Hand', 'Height', 'Weight', 'DOB', 'Team']

# 1. wrap everything in context manager
with open('MLB_Active_Roster.csv', 'a', newline='') as fp:
    f = csv.writer(fp)

    # 2. write headers before anything else
    f.writerow(headers)

    # 3. now process the loop
    for team in team_list:
        # Do everything else...

You could also define your headers similarily to team_list outside the loop, which leads to cleaner code.

Upvotes: 1

archang31
archang31

Reputation: 31

Another method would be to simply do it before the for loop so you do not have to check if already written.

import requests
import csv
from bs4 import BeautifulSoup

team_list={'yankees','redsox'}

with open('MLB_Active_Roster.csv', 'w', newline='') as fp:
    f = csv.writer(fp)
    f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])

for team in team_list:
    do_your_bs4_and_parsing_stuff

    with open('MLB_Active_Roster.csv', 'a', newline='') as fp:
        f = csv.writer(fp)
        f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))

You can also open the document just once instead of three times as well

import requests
import csv
from bs4 import BeautifulSoup

team_list={'yankees','redsox'}

with open('MLB_Active_Roster.csv', 'w', newline='') as fp:
    f = csv.writer(fp)
    f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])

    for team in team_list:
        do_your_bs4_and_parsing_stuff

        f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))

Upvotes: 2

Harun ERGUL
Harun ERGUL

Reputation: 5942

Using a variable to check if header is added or not may be helpful. If header added it will not add second times

header_added = False
for team in team_list:
    do_some stuff

    with open('MLB_Active_Roster.csv', 'a', newline='') as fp:
        f = csv.writer(fp)
        if not header_added:
            f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])
            header_added = True
        f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))

Upvotes: 3

Related Questions