Reputation: 1392
Whis scripts read from a source with lines consisting of artist names followed by a parenthesis with information about whether the artists cancelled and which country they come from.
A normal sentence may look like:
Odd Nordstoga (NO) (Cancelled), 20-08-2012, Blå
As I import the data I decode them into UTF-8 and this works fine. Uncommenting the second comment in the else block of the remove_extra() method shows that all variables are of type Unicode.
However, when a value is returned and put into another variable and the value of this is tested, the majority of the variables seems to be of NoneType.
Why does this happen? And how can it be corrected? Seems to be an error happening between the method return and assignment of the new variable.
# -*- charset: utf-8 -*-
import re
f1 = open("oya_artister_2011.csv")
artister = []
navnliste = []
PATTERN = re.compile(r"(.*)(\(.*\))")
TEST_PAT = re.compile(r"\(.*\)")
def remove_extra(tekst):
if re.search(PATTERN, tekst) > 1:
after = re.findall(PATTERN, tekst)[0][0]
#print "tekst is: %s " % tekst
#print "and of type: %s" % type(tekst)
remove_extra(after)
else:
#print "will return: ", tekst
#print "of type: %s" % type(tekst)
return tekst
for line in f1:
navn, _rest = line.split(",",1)
navn = navn.decode("utf-8")
artister.append(navn)
for artist in artister:
ny_artist = remove_extra(artist)
#print "%s" % ny_artist
print "of type: %s" % type(ny_artist)
Upvotes: 0
Views: 126
Reputation: 23265
Try
return remove_extra(after)
instead of just
remove_extra(after)
Upvotes: 1