Biggen
Biggen

Reputation: 295

Can't figure out regex match for list

I'm not sure where to begin figuring out how to pull just the team names out of the small snippet of list below. There seems to be so much variation. Obviously, there is a single space preceding all teams names. But they are not fixed length names and some have hyphens, apostrophes, and spaces inside of the team name themselves. There is always at least one space after the last word of the team and before either the single "A" or a double "AA" letters at the end.

&nbsp  1  Clemson              A  =
&nbsp  5  Ohio State           A  =
&nbsp155  Tennessee-Martin     AA =
&nbsp152  Louisiana-Monroe     A  =
&nbsp104  Hawai'i              A  =
&nbsp193  VMI                  AA =
&nbsp202  Stephen F. Austin    AA =

Any Regex guys want to take a crack at this?

Upvotes: 0

Views: 41

Answers (3)

Nishanth Matha
Nishanth Matha

Reputation: 6081

Try using the following regex:

\d\s+(.*?)\s+=

    - \d match digit
    - \s+ followed by one or more space
    - (.*) anything
    - \s+ followed by one or more spaces
    - = followed by  `=`

The captured group will give you team name

Regex Demo

Edit if A/AA isn't part of team name do:

\d\s+(.*?)\s+[A]+\s+=

Updated Regex

Upvotes: 1

zwer
zwer

Reputation: 25769

That's relatively easy:

import re

raw = """
&nbsp  1  Clemson              A  =
&nbsp  5  Ohio State           A  =
&nbsp155  Tennessee-Martin     AA =
&nbsp152  Louisiana-Monroe     A  =
&nbsp104  Hawai'i              A  =
&nbsp193  VMI                  AA =
&nbsp202  Stephen F. Austin    AA =
"""

teams = re.findall(r"&nbsp\s*\d+\s+(.*?)\s+A+\s+=", raw)

for team in teams:
    print(team)

# Clemson
# Ohio State
# Tennessee-Martin
# Louisiana-Monroe
# Hawai'i
# VMI
# Stephen F. Austin

Upvotes: 2

gidim
gidim

Reputation: 2323

How about something like this? No regex required.

lines is a list of strings, where each string is a line from your data.

for line in lines:
    splits = line.split(" ")
    teamName = splits[1]
    if hasNumbers(teamName):
        teamName = splits[2]

    print(teamName)


def hasNumbers(inputString):
    return any(char.isdigit() for char in inputString)

Upvotes: 1

Related Questions