Syed Ahmed Ali Shah
Syed Ahmed Ali Shah

Reputation: 69

KeyError: "None of [Int64Index([1960, 1961, 1962, 1963, 1964], dtype='int64')] are in the [columns]"

I am trying to scatter plot a dataframe and for this I have provided it with x and y components. It is showing error in the x component. it gives the error on 'Year' column. I have checked manually that Year Column exists in the dataframe still it shows error. Note that year column contains years from 1960 to 1964. enter image description here

urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000)
df_urb_pop = next(urb_pop_reader)
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']
pops = zip(df_pop_ceb['Total Population'], 
           df_pop_ceb['Urban population (% of total)'])
pops_list = list(pops)

# Use list comprehension to create new DataFrame column 'Total Urban Population'
df_pop_ceb['Total Urban Population'] = [int(a[0]*(a[1]*0.01)) for a in pops_list]

# Plot urban population data
df_pop_ceb.plot(kind='scatter', x=df_pop_ceb['Year'], y=df_pop_ceb['Total Urban Population'])
plt.show()

Upvotes: 2

Views: 1491

Answers (2)

Dan
Dan

Reputation: 45741

If you want to use pandas' plotting, you should pass the labels as x and y, not the data:

df_pop_ceb.plot(kind='scatter', x='Year', y='Total Urban Population')

Also looking at the docs I think you should rather do

df_pop_ceb.plot.scatter(x='Year', y='Total Urban Population')

Upvotes: 4

Adam B.
Adam B.

Reputation: 192

The error raises because you are trying to apply the plt method to the dataframe directly. Try:

import matplotlib as plt
plt.scatter(x=df_pop_ceb['Year'], y=df_pop_ceb['Total Urban Population'])
plt.title('Title')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

Also, there's no need to zip to calculate the total urban population. You could just multiply both columns directly:

df_pop_ceb['Total Urban Population'] = (df_pop_ceb['Total Population']*df_pop_ceb['Urban population (% of total)']*0.01)

Hope that helps!

Upvotes: 0

Related Questions