Reputation: 334
I want to create a box plot using pandas. I have data with average temperatures and I want to select three cities and create three box plots to compare temperatures among these cities. To achieve this, I have created a result
DataFrame to store the data, the values for cities are supposed to be stored in three columns (one column per city).
However, the following code only shows plot for the first city. The problem is with the DataFrame. A separated query correctly gives a series of values, but when I insert it into the result
dataset, a column of NaN
values is stored there. What I am missing here?
import pandas
import matplotlib.pyplot as plt
import wget
wget.download("https://raw.githubusercontent.com/pesikj/python-012021/master/zadani/5/temperature.csv")
temperatures = pandas.read_csv("temperature.csv")
helsinki = temperatures[temperatures["City"] == "Helsinki"]["AvgTemperature"]
miami = temperatures[temperatures["City"] == "Miami Beach"]["AvgTemperature"]
tokyo = temperatures[temperatures["City"] == "Tokyo"]["AvgTemperature"]
result = pandas.DataFrame()
result["Helsinki"] = helsinki
result["Miami Beach"] = miami
result["Tokyo"] = tokyo
result.plot(kind="box",whis=[0,100])
plt.show()
Upvotes: 1
Views: 299
Reputation: 7509
Since you're using data science packages, consider using seaborn
, which does the job of filtering/grouping data for you whenever you call one of its plot functions:
# Load dataset
url = "https://raw.githubusercontent.com/pesikj/python-012021/master/zadani/5/temperature.csv"
temperatures = pd.read_csv(url)
# Filter for cities of interest
cities = ['Helsinki', 'Miami Beach', 'Tokyo']
filtered_temperatures = temperatures.loc[temperatures['City'].isin(cities)]
# Let seaborn do the grouping
sns.violinplot(data=filtered_temperatures, x='City', y='AvgTemperature')
plt.show()
Upvotes: 1
Reputation: 41327
Pivot into City
columns using pivot_table()
and select the 3 cities you want:
result = temperatures.pivot_table(
index='Day',
columns='City',
values='AvgTemperature',
)[['Helsinki', 'Miami Beach', 'Tokyo']]
# City Helsinki Miami Beach Tokyo
# Day
# 1 29.6 74.6 59.1
# 2 29.5 76.8 62.3
# ...
# 29 35.3 77.7 58.4
# 30 35.7 78.0 51.5
result.plot(kind='box', whis=[0,100])
Upvotes: 2