Reputation: 21
I'm trying to make a distribution histogram with the information from two columns in a CSV file that contains in total 5 columns with 300 rows. The CSV File looks kinda like this:
X Input Output X X
10.5 500
14 645.5
12.5 525
9.5 550
15.5 600
11 510.5
11 500
10.5 525
I want to make it using the columns Input
and Output
This is the code I tried
import csv
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('results.csv')
plt.hist(data['Input'])
plt.hist(data['Output'])
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Distribution Histogram')
plt.show()
But unfortunately, the plot I get doesn't make much sense. I want to plot with just one histogram using those two columns, where one column is interpreted as a value and another one is dependant on this value. Is there a way to do this without two separate plots?
Upvotes: 2
Views: 756
Reputation: 35636
Try DataFrame.plot with the kind set to 'hist':
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame({
'Input': {0: 10.5, 1: 14.0, 2: 12.5, 3: 9.5, 4: 15.5, 5: 11.0, 6: 11.0,
7: 10.5},
'Output': {0: 500.0, 1: 645.5, 2: 525.0, 3: 550.0, 4: 600.0, 5: 510.5,
6: 500.0, 7: 525.0}
})
data.plot(x='Input', y='Output', kind='hist')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Distribution Histogram')
plt.show()
Upvotes: 1