Sergio
Sergio

Reputation: 21

Make a distribution histogram of two columns

I'm trying to make a distribution histogram with the information from two columns in a CSV file that contains in total 5 columns with 300 rows. The CSV File looks kinda like this:

X  Input  Output  X  X
   10.5    500      
   14      645.5      
   12.5    525      
   9.5     550      
   15.5    600      
   11      510.5      
   11      500      
   10.5    525      

I want to make it using the columns Input and Output This is the code I tried

import csv
import matplotlib.pyplot as plt
import pandas as pd

data = pd.read_csv('results.csv')
plt.hist(data['Input'])
plt.hist(data['Output'])

plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Distribution Histogram')
plt.show()

But unfortunately, the plot I get doesn't make much sense. I want to plot with just one histogram using those two columns, where one column is interpreted as a value and another one is dependant on this value. Is there a way to do this without two separate plots?

Upvotes: 2

Views: 756

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35636

Try DataFrame.plot with the kind set to 'hist':

import pandas as pd
import matplotlib.pyplot as plt

data = pd.DataFrame({
    'Input': {0: 10.5, 1: 14.0, 2: 12.5, 3: 9.5, 4: 15.5, 5: 11.0, 6: 11.0,
              7: 10.5},
    'Output': {0: 500.0, 1: 645.5, 2: 525.0, 3: 550.0, 4: 600.0, 5: 510.5,
               6: 500.0, 7: 525.0}
})

data.plot(x='Input', y='Output', kind='hist')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Distribution Histogram')
plt.show()

enter image description here

Upvotes: 1

Related Questions