Filter Series/DataFrame by another DataFrame

Question

Let's suppose I have a Series (or DataFrame) s1, for example list of all Universities and Colleges in the USA:

                     University
0            Searcy Harding University
1          Angwin Pacific Union College
2    Fairbanks University of Alaska Fairbanks
3        Ann Arbor University of Michigan

And another Series (od DataFrame) s2, for example list of all cities in the USA:

      City
0    Searcy
1    Angwin 
2   New York 
3   Ann Arbor

And my desired output (bascially an intersection of s1 and s2):

     Uni City
0     Searcy
1     Angwin 
2    Fairbanks 
3    Ann Arbor

The thing is: I'd like to create a Series that consists of cities but only these, that have a university/college. My very first thought was to remove "University" or "College" parts from the s1, but it turns out that it is not enough, as in case of Angwin Pacific Union College. Then I thought of leaving only the first word, but that excludes Ann Arbor. Finally, I got a series of all the cities s2 and now I'm trying to use it as a filter (something similiar to .contains() or .isin()), so if a string s1 (Uni name) contains any of the elements of s2 (city name), then return only the city name.

My question is: how to do it in a neat way?

Serge Ballesta · Accepted Answer

I would try to build a list comprehension of cities that are contained in at least one university name:

pd.Series([i for i in s2 if s1.str.contains(i).any()], name='Uni City')

With your example data it gives:

0       Searcy
1       Angwin
2    Ann Arbor
Name: Uni City, dtype: object

Filter Series/DataFrame by another DataFrame

Answers (2)

Related Questions