Reputation: 21
Trying to scrape swedish members of parliament with Beautiful Soup. When I run the scraper I get "ValueError: too many values to unpack (expected 3)".
The script outputs a csv, but only with five names. The sixth person on the list is named Alm Ericson, Janine (MP). I suppose the problem is that she has two last names - Alm Ericson, and the code only expects three values, firstname, lastname and party.
How should I code the field-split to make this work also for double last names?
The names on the page are written as
Last_name, first_name (party)
Code:
import urllib.request
import bs4 as bs
import csv
source = urllib.request.urlopen("https://www.riksdagen.se/sv/ledamoter-partier/").read()
soup = bs.BeautifulSoup(source, "lxml")
data = []
for span in soup.find_all("span", {"class": "fellow-name"}):
cleanednames = span.text.strip()
data.append(cleanednames) #fields are appended to list rather printing
with open("riksdagsledamoter.csv", "w") as stream:
fieldnames = ["Last_Name","First_Name","Party"]
var = csv.DictWriter(stream, fieldnames=fieldnames)
var.writeheader()
for item in data:
last_name, First_name, party = item.split() #splitting data in 3 fields
last_name = last_name.replace(",","") #removing ',' from last name
party = party.replace("(","").replace(")","") #removing "()" from party
var.writerow({"Last_Name": last_name,"First_Name": First_name, "Party": party}) #writing to csv row
Upvotes: 1
Views: 108
Reputation: 6099
Here is a simple regex that should do the trick
import re
print(re.match("(.*), (.*) \((.*)\)", 'Alm Ericson, Janine (MP)').groups())
Inspired from Corentin's answer
Upvotes: 2
Reputation: 5006
Well obviously splitting is not a good solution here. (or you should split on comma and parenthesis instead of spaces)
Using regexp :
import re
re.match('([^,]*), ([^(]*) \((.*)\)', 'Alm Ericson, Janine (MP)').groups()
Returns
('Alm Ericson', 'Janine', 'MP')
Upvotes: 4
Reputation: 84465
I guess you could also use a function to return the parts in a list (not as clean as answer already give) e.g.
def getParts(inputString):
list1 = inputString.split(",")
list2 = list1[1].split("(")
finalList = [list1[0], list2[0].strip(),list2[1].replace(")","")]
return finalList
inputString = 'Alm Ericson, Janine (MP)'
print(getParts(s))
Upvotes: 0