Reputation: 295
I'm not sure where to begin figuring out how to pull just the team names out of the small snippet of list below. There seems to be so much variation. Obviously, there is a single space preceding all teams names. But they are not fixed length names and some have hyphens, apostrophes, and spaces inside of the team name themselves. There is always at least one space after the last word of the team and before either the single "A" or a double "AA" letters at the end.
  1 Clemson A =
  5 Ohio State A =
 155 Tennessee-Martin AA =
 152 Louisiana-Monroe A =
 104 Hawai'i A =
 193 VMI AA =
 202 Stephen F. Austin AA =
Any Regex guys want to take a crack at this?
Upvotes: 0
Views: 41
Reputation: 6081
Try using the following regex:
\d\s+(.*?)\s+=
- \d match digit
- \s+ followed by one or more space
- (.*) anything
- \s+ followed by one or more spaces
- = followed by `=`
The captured group will give you team name
Edit if A/AA isn't part of team name do:
\d\s+(.*?)\s+[A]+\s+=
Upvotes: 1
Reputation: 25769
That's relatively easy:
import re
raw = """
  1 Clemson A =
  5 Ohio State A =
 155 Tennessee-Martin AA =
 152 Louisiana-Monroe A =
 104 Hawai'i A =
 193 VMI AA =
 202 Stephen F. Austin AA =
"""
teams = re.findall(r" \s*\d+\s+(.*?)\s+A+\s+=", raw)
for team in teams:
print(team)
# Clemson
# Ohio State
# Tennessee-Martin
# Louisiana-Monroe
# Hawai'i
# VMI
# Stephen F. Austin
Upvotes: 2
Reputation: 2323
How about something like this? No regex required.
lines
is a list of strings, where each string is a line from your data.
for line in lines:
splits = line.split(" ")
teamName = splits[1]
if hasNumbers(teamName):
teamName = splits[2]
print(teamName)
def hasNumbers(inputString):
return any(char.isdigit() for char in inputString)
Upvotes: 1