Reputation: 65
I have been taking online classes at datacamp for Python data science, but when I take the same code that I use on there and run it on my computer (as opposed to their website), I am getting errors that I do not understand. I am using Spyder and Python 3.6.
The goal of my code is to import a .csv file, extract two rows and two columns from the pandas dataframe and print out the results. From there I can graph the data on a histogram, and then expand it. But first, I have to get the basics to work. The code I have been using is:
import pandas as pd
df = pd.read_csv('drinks.csv')
df1 = df.loc[['USA', 'Germany'], ['country', 'beer_servings']]
print(df1)
The error I get is:
KeyError: "None of [['USA', 'Germany']] are in the [index]"
In case anyone wants to see the data I am using, the link I used to download it is: https://github.com/fivethirtyeight/data/blob/master/alcohol-consumption/drinks.csv
Even if I go as simple as I possibly can and just extract a single row, I still get the same error (as seen below). The same exact thing happens if I try to extract a single column.
import pandas as pd
df = pd.read_csv('drinks.csv')
df1 = df.loc[['USA']]
print(df1)
The error is:
KeyError: "None of [['USA']] are in the [index]"
Is there something i'm missing?
https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/
This is the website I was using to try and understand what I was doing wrong, but for the life of me I cannot figure out what I am missing. I understand that this is probably a very trivial problem, but please if you have any advice I would love to hear it, thanks in advance for any help!
Upvotes: 0
Views: 1336
Reputation: 61
Try:
>>> df.loc[df['country'].isin(['USA', 'Germany']), ['country', 'beer_servings']]
country beer_servings
65 Germany 346
184 USA 249
Upvotes: 0
Reputation: 13913
You need to set the country column to the index first:
import pandas as pd
df = pd.read_csv('drinks.csv').set_index('country')
df1 = df.loc[['USA', 'Germany'], 'beer_servings']
print(df1)
Output:
country
USA 249
Germany 346
Name: beer_servings, dtype: int64
Upvotes: 1
Reputation: 21739
You can do:
df1 = df.loc[df['country'].isin(['USA', 'Germany']), ['country', 'beer_servings']]
Or, you can set_index
first to make your existing code work.
df = df.set_index('country')
Upvotes: 0