Reputation: 1072
I am trying to find the correlation of all the columns in this dataset excluding quality
and then plot the frequency distribution of wine quality.
I am doing it the following way, but how do I remove quality?
import pandas as pd
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv', sep=';')
df.corr()
It returns this output:
How can I graph the frequency distribution of wine quality with pandas?
I previously used R for correlation and it worked fine for me but on this dataset I am learning use of pandas and python:
winecor = cor(wine[-12])
hist(wine$quality)
So in R I am getting the following output and I am looking for same in Python.
Upvotes: 0
Views: 1010
Reputation: 6333
# Import plotting library
import matplotlib.pyplot as plt
### Option 1 - histogram
plt.hist(df['quality'], bins=range(3, 10))
plt.show()
### Option 2 - bar plot (looks nicer)
# Get frequency per quality group
x = df.groupby('quality').size()
# Plot
plt.bar(x.index, x.values)
plt.show()
In order to get the correlation matrix of features, excluding quality
:
# Option 1 - very similar to R
df.iloc[:, :-1].corr()
# Option 2 - more Pythonic
df.drop('quality', axis=1).corr()
Upvotes: 2
Reputation: 51
You can plot histograms with:
import matplotlib.pyplot as plt
plt.hist(x=df['quality'], bins=30)
plt.show()
Read the docs of plt.hist() in order to understand better all the attributes
Upvotes: 1