Tom D
Tom D

Reputation: 65

Error when trying to extract specific columns/rows from Pandas dataframe using .loc

I have been taking online classes at datacamp for Python data science, but when I take the same code that I use on there and run it on my computer (as opposed to their website), I am getting errors that I do not understand. I am using Spyder and Python 3.6.

The goal of my code is to import a .csv file, extract two rows and two columns from the pandas dataframe and print out the results. From there I can graph the data on a histogram, and then expand it. But first, I have to get the basics to work. The code I have been using is:

import pandas as pd

df = pd.read_csv('drinks.csv')
df1 = df.loc[['USA', 'Germany'], ['country', 'beer_servings']]
print(df1)

The error I get is:

KeyError: "None of [['USA', 'Germany']] are in the [index]"

In case anyone wants to see the data I am using, the link I used to download it is: https://github.com/fivethirtyeight/data/blob/master/alcohol-consumption/drinks.csv

Even if I go as simple as I possibly can and just extract a single row, I still get the same error (as seen below). The same exact thing happens if I try to extract a single column.

import pandas as pd

df = pd.read_csv('drinks.csv')
df1 = df.loc[['USA']]
print(df1)

The error is:

KeyError: "None of [['USA']] are in the [index]"

Is there something i'm missing?

https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/

This is the website I was using to try and understand what I was doing wrong, but for the life of me I cannot figure out what I am missing. I understand that this is probably a very trivial problem, but please if you have any advice I would love to hear it, thanks in advance for any help!

Upvotes: 0

Views: 1336

Answers (3)

Vinicius Barcelos
Vinicius Barcelos

Reputation: 61

Try:

>>> df.loc[df['country'].isin(['USA', 'Germany']), ['country', 'beer_servings']]
     country  beer_servings
65   Germany            346
184      USA            249

Upvotes: 0

Alicia Garcia-Raboso
Alicia Garcia-Raboso

Reputation: 13913

You need to set the country column to the index first:

import pandas as pd

df = pd.read_csv('drinks.csv').set_index('country')
df1 = df.loc[['USA', 'Germany'], 'beer_servings']
print(df1)

Output:

country
USA        249
Germany    346
Name: beer_servings, dtype: int64

Upvotes: 1

YOLO
YOLO

Reputation: 21739

You can do:

df1 = df.loc[df['country'].isin(['USA', 'Germany']), ['country', 'beer_servings']]

Or, you can set_index first to make your existing code work.

df = df.set_index('country')

Upvotes: 0

Related Questions