RJL
RJL

Reputation: 351

Check if items in one list exists in another list

I have the following data. My goal is to check whether each row is part of the US or not.

data = [', Accomack, Virginia, USA',
 'Elkin, Surry, North Carolina, USA',
 'Philippines',
 '(null)',
 'Texas, United States',
 'Kingston, Washington, Rhode Island, United States']

I first used the following to split them into lists within the list and remove the white space:

place = []

for d in data:

   row = d.split(',')

   rowlist = []

   for r in row:

      r_stripped = r.strip()

      rowlist.append(r_stripped)

   place.append(rowlist)

place

I got the following output, which is what I expected:

[['', 'Accomack', 'Virginia', 'USA'],
 ['Elkin', 'Surry', 'North Carolina', 'USA'],
 ['Philippines'],
 ['(null)'],
 ['Texas', 'United States'],
 ['Kingston', 'Washington', 'Rhode Island', 'United States']]

Then I used the following to try to see if each item is in the US or not:

country = []

US = ['USA', 'United States'] 

for p in place:
    for item in US:
        if item in p:
            c = 'US'

        else:
            c = 'Non-US'
    country.append(c)

country

For some reason, the code is not able to capture the first two rows as part of US.

['Non-US', 'Non-US', 'Non-US', 'Non-US', 'US', 'US']

It's even more curious if i remove the 'else: c = 'Non-US' condition, everything becomes 'US'.

Can anyone please tell me what I am not doing right? Thanks!

Upvotes: 1

Views: 1598

Answers (2)

Yuan Wang
Yuan Wang

Reputation: 155

new_data = filter(lambda x: 'USA' in x or 'United States' in x, data)
return new_data

Upvotes: 1

Ozgur Vatansever
Ozgur Vatansever

Reputation: 52213

You should check if any item in p is in the list named US by updating the inner loop like below:

>>> for p in place:
...     for item in p:
...         if item in US:
...             c = "US"
...             break
...     else:
...         c = "Non-US"
...     country.append(c)

The else clause of the inner for loop is executed only if loop doesn't break. When you see item in the US list, you break out of the loop for the next p in the line.

--

However, you can make it more user-readable by leveraging any() and a little bit of list comprehensions:

>>> ["US" if any(item in US for item in p) else "Non-US" for p in place]
['US', 'US', 'Non-US', 'Non-US', 'US', 'US']

--

any() also helps you eliminate the inner loop totally:

>>> for p in place:
...     if any(item in US for item in p):
...         c = "US"
...     else:
...         c = "Non-US"
...     country.append(c)

Upvotes: 2

Related Questions