Lavrans Grjotheim
Lavrans Grjotheim

Reputation: 9

Search for variable name using iloc function in pandas dataframe

I have a pandas dataframe that consist of 5000 rows with different countries and emission data, and looks like the following:

country year emissions
peru 2020 1000
2019 900
2018 800

The country label is an index.

eg. df = emission.loc[['peru']]

would give me a new dataframe consisting only of the emission data attached to peru. My goal is to use a variable name instead of 'peru' and store the country-specific emission data into a new dataframe.

what I search for is a code that would work the same way as the code below:

country = 'zanzibar'

df = emissions.loc[[{country}]]

From what I can tell the problem arises with the iloc function which does not accept variables as input. Is there a way I could circumvent this problem?

In other words I want to be able to create a new dataframe with country specific emission data, based on a variable that matches one of the countries in my emission.index()all without having to change anything but the given variable.

One way could be to iterate through or maybe create a function in some way? Thank you in advance for any help.

Upvotes: 0

Views: 971

Answers (3)

Reyhansyah
Reyhansyah

Reputation: 1

I don't know if this solution is the same as your question. In this case I will give the solution to make a country name into a variable

But, because a variable name can't be named by space (" ") character, you have to replace the space character to underscore ("_") character.

(Just in case your 'country' values have some country names using more than one word)

Example:

  • the United Kingdom to United_Kingdom

by using this code:

df['country'] = df['country'].replace(' ', '_', regex=True)

So after your country names changed to a new format, you can get all the country names to a list from the dataframe using .unique() and you can store it to a new variable by this code:

country_name = df['country'].unique()

After doing that code, all the unique values in 'country' columns are stored to a list variable called 'country_name'

Next,

Use for to make an iteration to generate a new variable by country name using this code:

for i in country_name:
    locals()[i] = df[df['country']=="%s" %(i)]

So, locals() here is to used to transform string format to a non-string format (because in 'country_name' list is filled by country name in string format) and df[df['country']=="%s" %(i)] is used to subset the dataframe by condition country = each unique values from 'country_name'.

After that, it already made a new variable for each country name in 'country' columns.

Hopefully this can help to solve your problem.

Upvotes: 0

Pepsi-Joe
Pepsi-Joe

Reputation: 447

An alternative approach where you dont use a country name for your index:

emissions = pd.DataFrame({'Country' : ['Peru', 'Peru', 'Peru', 'Chile', 'Chile', 'Chile'], "Year" : [2021,2020,2019,2021,2020,2019], 'Emissions' : [100,200,400,300,200,100]})
country = 'Peru'

Then to filter:

df = emissions[emissions.Country == country]

or

df = emissions.loc[emissions.Country == country]

Giving:

   Country  Year  Emissions
0  Peru     2021  100
1  Peru     2020  200
2  Peru     2019  400

Upvotes: 1

Derek O
Derek O

Reputation: 19635

You should be able to select by a certain string for your index. For example:

df = pd.DataFrame({'a':[1,2,3,4]}, index=['Peru','Peru','zanzibar','zanzibar'])
country = 'zanzibar'
df.loc[{country}]

This will return:

          a
zanzibar  3
zanzibar  4

In your case, removing one set of square brackets should work:

country = 'zanzibar'
df = emissions.loc[{country}]

Upvotes: 0

Related Questions