LeCoda
LeCoda

Reputation: 1016

Pandas - filtering data sets and combining them

I am filtering ancestry from the ABS in Australia. I am taking ancestry data as below.

allvic_url='ABS_CENSUS2011_T09/TOT+1+2+3+4+Z.TOT+TOTP+1101+1102+6101+3204+2303+2101+5201+2305+2306+3205+3304+7106+2201+3103+6902+4106+3206+3104+1201+1202+3307+3308+2102+3213+7115+9215+3106+4907+5107+2103+OTH+Z.2.SA2..A/all?detail=Full&dimensionAtObservation=AllDimensions'

I am then having issues showing the changes in ancestry over time by using the newer 2016 dataset, as the api is extremely confusing... here http://stat.data.abs.gov.au/# . (I want to try and show how demographics are changing in suburbs as a datapoint).

The output I am aiming for is a dataframe on a suburb level that shows ancestry changes over time. If the api can give me more historical data on top of this that would be brilliant as well.

Suburb  Ancestry Main   Ancestry Secondary  Ancestry increased most Percentage increase 2016-2011   

(i.e., showing the ancestry average is chinese, adn has changed x% over time).

Would appreciate help on solving this issue - Thanks!

Upvotes: 5

Views: 389

Answers (1)

Zach Brookler
Zach Brookler

Reputation: 345

I believe (at least part of) the question you're asking is how to filter a pandas dataframe using regex. For data I used some of the venue values you provided in your array. For your use case, this would be done as follows:

from pandas import DataFrame

df_with_venues = DataFrame(
    {
        "Venue Category": [
            "Speakeasy",
            "Boutique",
            "Peruvian Restaurant",
            "Bakery",
            "Vietnamese Restaurant",
            "Asian Restaurant",
            "Hotel",
            "Whisky Bar",
            "Street Art",
            "Italian Restaurant",
            "Bookstore",
            "French Restaurant",
            "Café",
            "Sushi Restaurant",
            "Australian Restaurant",
            "Indian Restaurant",
            "Coffee Shop",
            "Bar",
            "Wine Bar",
            "Theater",
            "BBQ Joint",
            "Burger Joint",
        ]
    }
)

regex_filter_venues = df_with_venues[df_with_venues["Venue Category"].str.match(r".*Restaurant")]

You specified in your question that you wanted restaurant so I provided that in this solution, but you should be able to replace the string, ".*Restaurant", with any valid regex. Also for reference if anyone wants it, the documentation for pandas regex is here and there's also a great tutorial on different ways you can use pandas regex here

Upvotes: 2

Related Questions