Reputation: 445
Let's suppose I have a Series (or DataFrame) s1
, for example list of all Universities and Colleges in the USA:
University
0 Searcy Harding University
1 Angwin Pacific Union College
2 Fairbanks University of Alaska Fairbanks
3 Ann Arbor University of Michigan
And another Series (od DataFrame) s2
, for example list of all cities in the USA:
City
0 Searcy
1 Angwin
2 New York
3 Ann Arbor
And my desired output (bascially an intersection of s1
and s2
):
Uni City
0 Searcy
1 Angwin
2 Fairbanks
3 Ann Arbor
The thing is: I'd like to create a Series that consists of cities but only these, that have a university/college. My very first thought was to remove "University" or "College" parts from the s1
, but it turns out that it is not enough, as in case of Angwin Pacific Union College
. Then I thought of leaving only the first word, but that excludes Ann Arbor
.
Finally, I got a series of all the cities s2
and now I'm trying to use it as a filter (something similiar to .contains()
or .isin()
), so if a string s1
(Uni name) contains any of the elements of s2
(city name), then return only the city name.
My question is: how to do it in a neat way?
Upvotes: 2
Views: 67
Reputation: 26676
Data Used
s=pd.DataFrame({'University':['Searcy Harding University','Angwin Pacific Union College','Fairbanks University of Alaska Fairbanks','Ann Arbor University of Michigan']})
s2=pd.DataFrame({'City':['Searcy','Angwin','Fairbanks','Ann Arbor']})
Convert s2.City
to set to create an iterator
st=set(s2.City.unique().tolist())
Calculate s['Uni City']
using the next() function to return the next item from the iterator.
s['Uni City']=s['University'].apply(lambda x: next((i for i in st if i in x)), np.nan)
Outcome
Upvotes: 1
Reputation: 148910
I would try to build a list comprehension of cities that are contained in at least one university name:
pd.Series([i for i in s2 if s1.str.contains(i).any()], name='Uni City')
With your example data it gives:
0 Searcy
1 Angwin
2 Ann Arbor
Name: Uni City, dtype: object
Upvotes: 1