Filtering CSV with pandas

Question

I'm having difficulty filtering this CSV file.

Here are a few entries from the csv table:

Name      Info                                 Bio
Alice     Woman: 21y (USA)                     Actress
Breonna   Woman: (France)                      Singer
Carla     Woman: 30y (Trinidad and Tobago)     Actress
Diana     Woman: (USA)                         Singer

I'm trying to filter the Info row to get all a list of countries and their frequency. I'm also trying to do the same thing with age. As you can see, not all women post their age.

I've tried

women= pd.read_csv('women.csv')
women_count = pd.Series(' '.join(women.Info).split()).value_counts()

However, that splits everything and outputs:

Woman:     4
(USA)      2
21y        1
(Trinidad  1
and        1
Tobago)    1
30y        1

I should add I've tried women_filtered = women[women['Info'] == '(USA)'] and that doesn't work

My questions are:

How can I split the string just to filter by countries especially since all the countries are in parenthesis?
How can I filter for entries that do not have age?

Thank You

jtorca · Accepted Answer

import pandas as pd

df = pd.DataFrame(
{'Name':['Alice', 'Breonna', 'Carla', 'Diana'],
 'Info':['Woman: 21y (USA)', 'Woman: (France)', 'Woman: 30y (Trinidad and Tobago)', 'Woman: (USA)'],
 'Bio':['Actress', 'Singer', 'Actress', 'Singer']}
)

# defining columns using regex
df['country'] = df['Info'].str.extract('$([^$]+)\)')
df['age'] = df['Info'].str.extract('[\s]+([\d]{2})y[\s]+').astype(float)
df['noage'] = df['age'].isnull().astype(int)

# frequency of countries
sizes = df.groupby('country').size()
sizes

This will output the frequencies.

country
France                 1
Trinidad and Tobago    1
USA                    2
dtype: int64

I would lookup how to write regex expressions so you can learn how to extract information from strings yourself. Pythex.org is a nice website to try out regex expressions in Python and has some useful tips.

Filtering CSV with pandas

Answers (2)

Related Questions