Reputation: 9
I have a pandas dataframe that consist of 5000 rows with different countries and emission data, and looks like the following:
country | year | emissions |
---|---|---|
peru | 2020 | 1000 |
2019 | 900 | |
2018 | 800 |
The country label is an index.
eg. df = emission.loc[['peru']]
would give me a new dataframe consisting only of the emission data attached to peru. My goal is to use a variable name instead of 'peru' and store the country-specific emission data into a new dataframe.
what I search for is a code that would work the same way as the code below:
country = 'zanzibar'
df = emissions.loc[[{country}]]
From what I can tell the problem arises with the iloc function which does not accept variables as input. Is there a way I could circumvent this problem?
In other words I want to be able to create a new dataframe with country specific emission data, based on a variable that matches one of the countries in my emission.index()
all without having to change anything but the given variable.
One way could be to iterate through or maybe create a function in some way? Thank you in advance for any help.
Upvotes: 0
Views: 971
Reputation: 1
I don't know if this solution is the same as your question. In this case I will give the solution to make a country name into a variable
But, because a variable name can't be named by space (" ") character, you have to replace the space character to underscore ("_") character.
(Just in case your 'country' values have some country names using more than one word)
Example:
by using this code:
df['country'] = df['country'].replace(' ', '_', regex=True)
So after your country names changed to a new format, you can get all the country names to a list from the dataframe using .unique()
and you can store it to a new variable by this code:
country_name = df['country'].unique()
After doing that code, all the unique values in 'country' columns are stored to a list variable called 'country_name'
Next,
Use for
to make an iteration to generate a new variable by country name using this code:
for i in country_name:
locals()[i] = df[df['country']=="%s" %(i)]
So, locals()
here is to used to transform string format to a non-string format (because in 'country_name' list is filled by country name in string format) and df[df['country']=="%s" %(i)]
is used to subset the dataframe by condition country = each unique values from 'country_name'.
After that, it already made a new variable for each country name in 'country' columns.
Hopefully this can help to solve your problem.
Upvotes: 0
Reputation: 447
An alternative approach where you dont use a country name for your index:
emissions = pd.DataFrame({'Country' : ['Peru', 'Peru', 'Peru', 'Chile', 'Chile', 'Chile'], "Year" : [2021,2020,2019,2021,2020,2019], 'Emissions' : [100,200,400,300,200,100]})
country = 'Peru'
Then to filter:
df = emissions[emissions.Country == country]
or
df = emissions.loc[emissions.Country == country]
Giving:
Country Year Emissions
0 Peru 2021 100
1 Peru 2020 200
2 Peru 2019 400
Upvotes: 1
Reputation: 19635
You should be able to select by a certain string for your index. For example:
df = pd.DataFrame({'a':[1,2,3,4]}, index=['Peru','Peru','zanzibar','zanzibar'])
country = 'zanzibar'
df.loc[{country}]
This will return:
a
zanzibar 3
zanzibar 4
In your case, removing one set of square brackets should work:
country = 'zanzibar'
df = emissions.loc[{country}]
Upvotes: 0