Reputation: 69
I am trying to scatter plot a dataframe and for this I have provided it with x and y components. It is showing error in the x component. it gives the error on 'Year' column. I have checked manually that Year Column exists in the dataframe still it shows error. Note that year column contains years from 1960 to 1964.
urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000)
df_urb_pop = next(urb_pop_reader)
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']
pops = zip(df_pop_ceb['Total Population'],
df_pop_ceb['Urban population (% of total)'])
pops_list = list(pops)
# Use list comprehension to create new DataFrame column 'Total Urban Population'
df_pop_ceb['Total Urban Population'] = [int(a[0]*(a[1]*0.01)) for a in pops_list]
# Plot urban population data
df_pop_ceb.plot(kind='scatter', x=df_pop_ceb['Year'], y=df_pop_ceb['Total Urban Population'])
plt.show()
Upvotes: 2
Views: 1491
Reputation: 45741
If you want to use pandas' plotting, you should pass the labels as x and y, not the data:
df_pop_ceb.plot(kind='scatter', x='Year', y='Total Urban Population')
Also looking at the docs I think you should rather do
df_pop_ceb.plot.scatter(x='Year', y='Total Urban Population')
Upvotes: 4
Reputation: 192
The error raises because you are trying to apply the plt method to the dataframe directly. Try:
import matplotlib as plt
plt.scatter(x=df_pop_ceb['Year'], y=df_pop_ceb['Total Urban Population'])
plt.title('Title')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
Also, there's no need to zip to calculate the total urban population. You could just multiply both columns directly:
df_pop_ceb['Total Urban Population'] = (df_pop_ceb['Total Population']*df_pop_ceb['Urban population (% of total)']*0.01)
Hope that helps!
Upvotes: 0