Reputation: 541
Pulling my hair out a bit trying to figure out how to get my scraped data in an order that I can use. Basically I want the scraped data to look like this:
DateTime_match,DateTime_odds,sport,team,team_type,odds
05/11/2015 16:00:00,01/11/2015 10:37:58,NFL,New York Giants,home,1.21
05/11/2015 16:00:00,01/11/2015 10:37:58,NFL,New Orleans,away,3.5
I've done the hard part already by putting together the Python code for the elements I want, just need some help organizing it.
My python code:
import requests,re
from bs4 import BeautifulSoup
url ="http://www.sportsbet.com.au/betting/american-football"
r = requests.get(url)
soup = BeautifulSoup(r.content,'lxml')
#get Sport
g_data = soup.find_all("div", {"class": "accordion-main"})
for items in g_data:
sport = items.findAll('span', {'class': 'market_title'})[0]
print sport.text
#get odds and teams
g_data = soup.find_all("div", {"class": "accordion-body"})
for items in g_data:
teama = items.findAll('span', {'class': 'team-name ib'})[0]
teamb = items.findAll('span', {'class': 'team-name ib'})[3]
pricea = items.findAll('span')[1]
priceb = items.findAll('span')[7]
print teama.text + teamb.text + pricea.text.strip() + priceb.text.strip()
#get game and time
g_data = soup.find_all("div", {"class": "market-name cfix"})
for items in g_data:
game = items.findAll('a', {'class': 'link'})[0]
time = items.findAll('span', {'class': 'start-time timezone_time'})[0]
print game.text + time.text
#get date
g_data = soup.find_all("div", {"class": "bettypes-header cfix"})
for items in g_data:
game_date = items.findAll('span', {'class': 'timezone_day_date date'})[0]
print game_date.text
This is the output:
NFL Matches
Indianapolis ColtsCarolina Panthers3.331.35
Cleveland BrownsCincinnati Bengals5.651.16
Miami DolphinsBuffalo Bills2.301.65
Washington RedskinsNew England Patriots10.501.06
Oakland RaidersPittsburgh Steelers2.941.43
St. Louis RamsMinnesota Vikings2.251.68
Atlanta FalconsSan Francisco 49ers1.502.69
New York GiantsTampa Bay Buccaneers1.802.06
Philadelphia EaglesDallas Cowboys1.782.09
Chicago BearsSan Diego Chargers2.711.49
Indianapolis Colts At Carolina Panthers12:30
Cleveland Browns At Cincinnati Bengals12:25
Miami Dolphins At Buffalo Bills05:00
Washington Redskins At New England Patriots05:00
Oakland Raiders At Pittsburgh Steelers05:00
St. Louis Rams At Minnesota Vikings05:00
Atlanta Falcons At San Francisco 49ers08:05
New York Giants At Tampa Bay Buccaneers08:05
Philadelphia Eagles At Dallas Cowboys12:30
Chicago Bears At San Diego Chargers12:30
Tuesday 03/11/2015
Friday 06/11/2015
Monday 09/11/2015
Tuesday 10/11/2015
Upvotes: 0
Views: 63
Reputation: 2415
You need to store the data in variables and then organize them and print a proper output. Using just print
will print the data when it is called. Use variables, lists, dictionaries, etc. and then build your desired output.
For instance, instead of printing the teams and prices in a row like you are doing here:
print teama.text + teamb.text + pricea.text.strip() + priceb.text.strip()
Store data in variables:
team_a = teama.text
team_b = teamb.text
price_a = pricea.text.strip()
price_b = priceb.text.strip()
Or even by match (using a dictionary):
match = {
'team_a' : teama.text
'team_b' : teamb.text
'price_a' : pricea.text.strip()
'price_b' : priceb.text.strip()
}
Then print it as you wish:
# With variables
print '%s,%s,%s,%s' % (team_a, team_b, price_a, price_b)
# With dictionary
print '%s,%s,%s,%s' % (match['team_a'], match['team_b'], match['price_a'], match['price_b'])
By doing this you will get:
Indianapolis Colts,Carolina Panthers,3.33,1.35
For more information about how to format strings in Python check this.
Upvotes: 1