Reputation: 351
I have the following data. My goal is to check whether each row is part of the US or not.
data = [', Accomack, Virginia, USA',
'Elkin, Surry, North Carolina, USA',
'Philippines',
'(null)',
'Texas, United States',
'Kingston, Washington, Rhode Island, United States']
I first used the following to split them into lists within the list and remove the white space:
place = []
for d in data:
row = d.split(',')
rowlist = []
for r in row:
r_stripped = r.strip()
rowlist.append(r_stripped)
place.append(rowlist)
place
I got the following output, which is what I expected:
[['', 'Accomack', 'Virginia', 'USA'],
['Elkin', 'Surry', 'North Carolina', 'USA'],
['Philippines'],
['(null)'],
['Texas', 'United States'],
['Kingston', 'Washington', 'Rhode Island', 'United States']]
Then I used the following to try to see if each item is in the US or not:
country = []
US = ['USA', 'United States']
for p in place:
for item in US:
if item in p:
c = 'US'
else:
c = 'Non-US'
country.append(c)
country
For some reason, the code is not able to capture the first two rows as part of US.
['Non-US', 'Non-US', 'Non-US', 'Non-US', 'US', 'US']
It's even more curious if i remove the 'else: c = 'Non-US' condition, everything becomes 'US'.
Can anyone please tell me what I am not doing right? Thanks!
Upvotes: 1
Views: 1598
Reputation: 155
new_data = filter(lambda x: 'USA' in x or 'United States' in x, data)
return new_data
Upvotes: 1
Reputation: 52213
You should check if any item in p
is in the list named US
by updating the inner loop like below:
>>> for p in place:
... for item in p:
... if item in US:
... c = "US"
... break
... else:
... c = "Non-US"
... country.append(c)
The else clause of the inner for loop is executed only if loop doesn't break. When you see item
in the US
list, you break out of the loop for the next p
in the line.
--
However, you can make it more user-readable by leveraging any()
and a little bit of list comprehensions:
>>> ["US" if any(item in US for item in p) else "Non-US" for p in place]
['US', 'US', 'Non-US', 'Non-US', 'US', 'US']
--
any()
also helps you eliminate the inner loop totally:
>>> for p in place:
... if any(item in US for item in p):
... c = "US"
... else:
... c = "Non-US"
... country.append(c)
Upvotes: 2