nelalx
nelalx

Reputation: 69

List comprehension - group data based on occurance of specific elements in a list

I have a list like below :

['<h2 class="title-6-bold"> Premier League </h2>', '<span class="title-8-medium simple-match-card-team__name"> Fulham </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Liverpool </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Bournemouth </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Aston Villa </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<span class="title-8-medium simple-match-card-team__name"> Leeds </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Wolves </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Newcastle United </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Nottingham Forest </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<span class="title-8-medium simple-match-card-team__name"> Tottenham </span>', '<span class="title-7-bold simple-match-card-team__score"> 4 </span>', '<span class="title-8-medium simple-match-card-team__name"> Southampton </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Everton </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<span class="title-8-medium simple-match-card-team__name"> Chelsea </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<h2 class="title-6-bold"> Bundesliga </h2>', '<span class="title-8-medium simple-match-card-team__name"> 1. FC Union Berlin </span>', '<span class="title-7-bold simple-match-card-team__score"> 3 </span>', '<span class="title-8-medium simple-match-card-team__name"> Hertha BSC </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> M\'gladbach </span>', '<span class="title-7-bold simple-match-card-team__score"> 3 </span>', '<span class="title-8-medium simple-match-card-team__name"> Hoffenheim </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Augsburg </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<span class="title-8-medium simple-match-card-team__name"> SC Freiburg </span>', '<span class="title-7-bold simple-match-card-team__score"> 4 </span>', '<span class="title-8-medium simple-match-card-team__name"> VfL Bochum </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Mainz 05 </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> VfL Wolfsburg </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Werder Bremen </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Borussia Dortmund </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Bayer Leverkusen </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<h2 class="title-6-bold"> Scottish Premiership </h2>', '<span class="title-8-medium simple-match-card-team__name"> Aberdeen </span>', '<span class="title-7-bold simple-match-card-team__score"> 4 </span>', '<span class="title-8-medium simple-match-card-team__name"> St. Mirren </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Motherwell </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> St. Johnstone </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Rangers </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Kilmarnock </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<span class="title-8-medium simple-match-card-team__name"> Ross County </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Celtic </span>', '<span class="title-7-bold simple-match-card-team__score"> 3 </span>', '<h2 class="title-6-bold"> Ligue 1 Uber Eats </h2>', '<span class="title-8-medium simple-match-card-team__name"> Strasbourg </span>', '<span class="title-7-bold simple-match-card-team__score"> 1 </span>', '<span class="title-8-medium simple-match-card-team__name"> Monaco </span>', '<span class="title-7-bold simple-match-card-team__score"> 2 </span>', '<span class="title-8-medium simple-match-card-team__name"> Clermont </span>', '<span class="title-7-bold simple-match-card-team__score"> 0 </span>', '<span class="title-8-medium simple-match-card-team__name"> PSG </span>', '<span class="title-7-bold simple-match-card-team__score"> 5 </span>']

I am trying to extract data of a few top leagues and want to discard others. Following another example I have this code :

leagues = (['Premier League', 'Spanish La Liga', 'Bundesliga', 'Italian Serie A','Ligue 1 Uber Eats', 'Champions League'])

data = [[l[l.index(left) + len(left):l.index(right)] for l in data if i in l] for i in leagues]

But I am not getting the expected result like which should be like below :

[['Premier League', * all matches of PL], ['Bundesliga', * all Bundesliga matches]].

Please help me with this as I have been burning my head over this for quite a long time now.

Thanks

Upvotes: 0

Views: 57

Answers (1)

Nick
Nick

Reputation: 147266

You can iterate the values in your list, updating league, teams and scores when you see the matching tag, and then writing a match to the result when you have two teams and two scores. I've created a dict of matches with the league as the key, it should be reasonably easy to change the format if you want something else (e.g. with list(result.items()))

import re
from collections import defaultdict

result = defaultdict(list)
for d in data:
    l = re.search(r'>([^<]+)</h2>', d)
    if l is not None:
        league = l.group(1).strip()
        teams = []
        scores = []
        continue
    t = re.search(r'name">([^<]+)</span>', d)
    if t is not None:
        teams.append(t.group(1).strip())
        continue
    s = re.search(r'score">\s*(\d+)\s*</span>', d)
    if s is not None:
        scores.append(int(s.group(1)))
        if len(scores) == 2:
            result[league].append({ 'teams' : teams[:], 'scores' : scores[:] })
            teams = []
            scores = []

Output (for your sample data):

{
  'Premier League': [
    {'teams': ['Fulham', 'Liverpool'], 'scores': [2, 2]},
    {'teams': ['Bournemouth', 'Aston Villa'], 'scores': [2, 0]},
    {'teams': ['Leeds', 'Wolves'], 'scores': [2, 1]},
    {'teams': ['Newcastle United', 'Nottingham Forest'], 'scores': [2, 0]},
    {'teams': ['Tottenham', 'Southampton'], 'scores': [4, 1]},
    {'teams': ['Everton', 'Chelsea'], 'scores': [0, 1]}
  ],
  'Bundesliga': [
    {'teams': ['1. FC Union Berlin', 'Hertha BSC'], 'scores': [3, 1]}, 
    {'teams': ["M'gladbach", 'Hoffenheim'], 'scores': [3, 1]},
    {'teams': ['Augsburg', 'SC Freiburg'], 'scores': [0, 4]},
    {'teams': ['VfL Bochum', 'Mainz 05'], 'scores': [1, 2]},
    {'teams': ['VfL Wolfsburg', 'Werder Bremen'], 'scores': [2, 2]},
    {'teams': ['Borussia Dortmund', 'Bayer Leverkusen'], 'scores': [1, 0]}
  ],
  'Scottish Premiership': [
    {'teams': ['Aberdeen', 'St. Mirren'], 'scores': [4, 1]},
    {'teams': ['Motherwell', 'St. Johnstone'], 'scores': [1, 2]}
    {'teams': ['Rangers', 'Kilmarnock'], 'scores': [2, 0]},
    {'teams': ['Ross County', 'Celtic'], 'scores': [1, 3]}
  ],
  'Ligue 1 Uber Eats': [
    {'teams': ['Strasbourg', 'Monaco'], 'scores': [1, 2]},
    {'teams': ['Clermont', 'PSG'], 'scores': [0, 5]}
  ]
}

Upvotes: 1

Related Questions