Reputation: 1
I am new to data visualization, so please bear with me. I am trying to create a data plot that describes various different attributes on a data set on blockbuster movies. The x-axis will be year of the movie and the y-axis will be worldwide gross. Now, some movies have made upwards of a billion in this category, and it seems that my y axis is overwhelmed as it completely blocks out the numbers and becomes illegible. Here is what I have thus far:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('blockbusters.csv')
fig, ax = plt.subplots()
ax.set_title('Top Grossing Films')
ax.set_xlabel('Year')
ax.set_ylabel('Worldwide Grossing')
x = df['year'] #xaxis
y = df['worldwide_gross'] #yaxis
plt.show()
Any tips on how to scale this down? Ideally it could be presented on a scale of 10. Thanks in advance!
Upvotes: 0
Views: 305
Reputation: 3892
You could try logarithmic scaling:
ax.set_yscale('log')
You might want to manually set the ticks on the y-axis using
ax.set_yticks([list of values for which you want to have a tick])
ax.set_yticklabels([list of labels you want on each tick]) # optional
Another way to approach this might be to rank the movies (which gross is the highest, second highest, ...), i.e. on the y axis you would plot
df['worldwide_gross'].rank()
Edit: as you indicate, one might also check the dtypes to make sure the data is numerical. If not, use .astype(int)
or .astype(float)
to convert it.
Upvotes: 1