JavierSando
JavierSando

Reputation: 339

Filtering a dataframe using a list of strings

I have the following dataframe, called pop_df:

pop_df
Out[33]: 
            2014       2015       2016       2017       2018       2019
Geo                                                                    
AL     2892394.0  2885796.0  2875592.0  2876591.0  2870324.0  2862427.0
AL0    2892394.0  2885796.0  2875592.0  2876591.0  2870324.0  2862427.0
AL01    844921.0   836448.0   830981.0   826904.0   819793.0   813758.0
AL011   134332.0   131054.0   129056.0   125579.0   120978.0   118948.0
AL012   276058.0   277989.0   280205.0   284823.0   289626.0   290126.0
         ...        ...        ...        ...        ...        ...
UKN12   142028.0   142756.0   143363.0   143746.0   144105.0   144367.0
UKN13   139774.0   140222.0   140752.0   141368.0   141994.0   142565.0
UKN14   137722.0   139426.0   140691.0   141917.0   143286.0   144771.0
UKN15   136332.0   136904.0   137492.0   138000.0   138441.0   138948.0
UKN16   114696.0   115171.0   115581.0   116057.0   116612.0   117051.0

[2034 rows x 6 columns]

I also have a list of EU country codes:

EuropeanUnion=["BE","BG","CZ","DK","DE","EE","IE","EL","ES","FR","HR","IT","CY","LV","LT","LU","HU","MT","NL","AT","PL","PT","RO","SI","SK","FI","SE"]

I would like to filter the dataframe to keep only the rows starting by the country codes that are in the list. I tried with pop_df.index.str.startswith but I couldn't make it work. I'd appreciate any help you can provide. Thanks in advance!

Upvotes: 3

Views: 476

Answers (2)

kasper
kasper

Reputation: 92

I would recommend you to use slicing indexes (as string) up to two first letters and use pandas .isin method applied to country codes variable as a boolean mask:

eu_countries_filtered = pop_df[pop_df.index.str[:2].isin(EuropeanUnion)]

Upvotes: 2

Dani Mesejo
Dani Mesejo

Reputation: 61910

It seems that Geo is the index, so you could do:

result = df[df.index.str.match(rf'\b{"|".join(EuropeanUnion)}')]

Output (dummy)

           2014      2015      2016      2017      2018      2019
Geo                                                              
BE011  134332.0  131054.0  129056.0  125579.0  120978.0  118948.0
DE13   139774.0  140222.0  140752.0  141368.0  141994.0  142565.0

From the documentation on str.match:

Determine if each string starts with a match of a regular expression.

The expression rf'\b{"|".join(EuropeanUnion)}', builds a regular expression pattern that will match any of the country codes.

Upvotes: 3

Related Questions