Reputation: 69
I have a list like below :
['<h2 class="title-6-bold"> Premier League </h2>', '<span class="title-8-medium simple-match-card-team__name"> Fulham </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Liverpool </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Bournemouth </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Aston Villa </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<span class="title-8-medium simple-match-card-team__name"> Leeds </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Wolves </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Newcastle United </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Nottingham Forest </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<span class="title-8-medium simple-match-card-team__name"> Tottenham </span>', '<span class="title-7-bold simple-match-card-team__score"> 4 </span>', '<span class="title-8-medium simple-match-card-team__name"> Southampton </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Everton </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<span class="title-8-medium simple-match-card-team__name"> Chelsea </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<h2 class="title-6-bold"> Bundesliga </h2>', '<span class="title-8-medium simple-match-card-team__name"> 1. FC Union Berlin </span>', '<span class="title-7-bold simple-match-card-team__score"> 3 </span>', '<span class="title-8-medium simple-match-card-team__name"> Hertha BSC </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> M\'gladbach </span>', '<span class="title-7-bold simple-match-card-team__score"> 3 </span>', '<span class="title-8-medium simple-match-card-team__name"> Hoffenheim </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Augsburg </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<span class="title-8-medium simple-match-card-team__name"> SC Freiburg </span>', '<span class="title-7-bold simple-match-card-team__score"> 4 </span>', '<span class="title-8-medium simple-match-card-team__name"> VfL Bochum </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Mainz 05 </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> VfL Wolfsburg </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Werder Bremen </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Borussia Dortmund </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Bayer Leverkusen </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<h2 class="title-6-bold"> Scottish Premiership </h2>', '<span class="title-8-medium simple-match-card-team__name"> Aberdeen </span>', '<span class="title-7-bold simple-match-card-team__score"> 4 </span>', '<span class="title-8-medium simple-match-card-team__name"> St. Mirren </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Motherwell </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> St. Johnstone </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Rangers </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Kilmarnock </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<span class="title-8-medium simple-match-card-team__name"> Ross County </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Celtic </span>', '<span class="title-7-bold simple-match-card-team__score"> 3 </span>', '<h2 class="title-6-bold"> Ligue 1 Uber Eats </h2>', '<span class="title-8-medium simple-match-card-team__name"> Strasbourg </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Monaco </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Clermont </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<span class="title-8-medium simple-match-card-team__name"> PSG </span>', '<span class="title-7-bold simple-match-card-team__score"> 5 </span>']
I am trying to extract data of a few top leagues and want to discard others. Following another example I have this code :
leagues = (['Premier League', 'Spanish La Liga', 'Bundesliga', 'Italian Serie A','Ligue 1 Uber Eats', 'Champions League'])
data = [[l[l.index(left) + len(left):l.index(right)] for l in data if i in l] for i in leagues]
But I am not getting the expected result like which should be like below :
[['Premier League', * all matches of PL], ['Bundesliga', * all Bundesliga matches]].
Please help me with this as I have been burning my head over this for quite a long time now.
Thanks
Upvotes: 0
Views: 57
Reputation: 147266
You can iterate the values in your list, updating league, teams and scores when you see the matching tag, and then writing a match to the result when you have two teams and two scores. I've created a dict of matches with the league as the key, it should be reasonably easy to change the format if you want something else (e.g. with list(result.items())
)
import re
from collections import defaultdict
result = defaultdict(list)
for d in data:
l = re.search(r'>([^<]+)</h2>', d)
if l is not None:
league = l.group(1).strip()
teams = []
scores = []
continue
t = re.search(r'name">([^<]+)</span>', d)
if t is not None:
teams.append(t.group(1).strip())
continue
s = re.search(r'score">\s*(\d+)\s*</span>', d)
if s is not None:
scores.append(int(s.group(1)))
if len(scores) == 2:
result[league].append({ 'teams' : teams[:], 'scores' : scores[:] })
teams = []
scores = []
Output (for your sample data):
{
'Premier League': [
{'teams': ['Fulham', 'Liverpool'], 'scores': [2, 2]},
{'teams': ['Bournemouth', 'Aston Villa'], 'scores': [2, 0]},
{'teams': ['Leeds', 'Wolves'], 'scores': [2, 1]},
{'teams': ['Newcastle United', 'Nottingham Forest'], 'scores': [2, 0]},
{'teams': ['Tottenham', 'Southampton'], 'scores': [4, 1]},
{'teams': ['Everton', 'Chelsea'], 'scores': [0, 1]}
],
'Bundesliga': [
{'teams': ['1. FC Union Berlin', 'Hertha BSC'], 'scores': [3, 1]},
{'teams': ["M'gladbach", 'Hoffenheim'], 'scores': [3, 1]},
{'teams': ['Augsburg', 'SC Freiburg'], 'scores': [0, 4]},
{'teams': ['VfL Bochum', 'Mainz 05'], 'scores': [1, 2]},
{'teams': ['VfL Wolfsburg', 'Werder Bremen'], 'scores': [2, 2]},
{'teams': ['Borussia Dortmund', 'Bayer Leverkusen'], 'scores': [1, 0]}
],
'Scottish Premiership': [
{'teams': ['Aberdeen', 'St. Mirren'], 'scores': [4, 1]},
{'teams': ['Motherwell', 'St. Johnstone'], 'scores': [1, 2]}
{'teams': ['Rangers', 'Kilmarnock'], 'scores': [2, 0]},
{'teams': ['Ross County', 'Celtic'], 'scores': [1, 3]}
],
'Ligue 1 Uber Eats': [
{'teams': ['Strasbourg', 'Monaco'], 'scores': [1, 2]},
{'teams': ['Clermont', 'PSG'], 'scores': [0, 5]}
]
}
Upvotes: 1