Turning a scatter plot into a histogram in python

Question

I need to plot a histogram from some data in another file.

At the moment I have code that plots a scatter plot and fits a Gaussian.

The x-value is whatever the number is on the corresponding line from the data file it's reading in (after the first 12 lines of other information, i.e. Line 13 is the first event), and the y-value is the number of the line multiplied by a value.

Then plots and fits the scatter, but I need to be able to plot it as a histogram, and be able to change the bin width / number (i.e. add bin 1, 2, 3 and 4 together to have 1/4 of the bins overall with 4 times the numbers of events - so I guess adding together multiple lines from the data), which is where I am stuck.

How would I go about making this into the histogram and adjust the width / numbers?

Code below, didn't know how to make it pretty. Let me know if I can make it a bit easier to read.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from numpy import exp, loadtxt, pi, sqrt, random, linspace
from lmfit import Model
import glob, os

## Define gaussian
def gaussian(x, amp, cen, wid):
    """1-d gaussian: gaussian(x, amp, cen, wid)"""
    return (amp / (sqrt(2*pi) * wid)) * exp(-(x-cen)**2 / (2*wid**2))

## Define constants
stderrThreshold = 10
minimumAmplitude = 0.1
approxcen = 780
MaestroT = 53

## Define paramaters
amps = []; ampserr = []; ts = []
folderToAnalyze = baseFolder + fileToRun + '\'

## Generate the time array

for n in range(0, numfiles):
    
    ## Load text file
    x = np.linspace(0, 8191, 8192) 
    fullprefix = folderToAnalyze + prefix + str(n).zfill(3)
    y = loadtxt(fullprefix + ".Spe", skiprows= 12, max_rows = 8192) 

    ## Make figure
    fig, ax = plt.subplots(figsize=(15,8))
    fig.suptitle('Coincidence Detections', fontsize=20)
    plt.xlabel('Bins', fontsize=14)
    plt.ylabel('Counts', fontsize=14)

    ## Plot data
    ax.plot(x, y, 'bo')
    ax.set_xlim(600,1000)

    ## Fit data to Gaussian
    gmodel = Model(gaussian)
    result = gmodel.fit(y, x=x, amp=8, cen=approxcen, wid=1)

    ## Plot results and save figure
    ax.plot(x, result.best_fit, 'r-', label='best fit')
    ax.legend(loc='best')
    texttoplot = result.fit_report()
    ax.text(0.02, 0.5, texttoplot, transform=ax.transAxes)
    plt.close()
    fig.savefig(fullprefix + ".png", pad_inches='0.5')

Current Output: Scatter plots, that do show the expected distribution and plot of the data (they do however have a crappy reduced chi^2, but one problem at a time)

Expected Output: Histogram plots of the same data, with the same distribution and fitting, when each event is plotted as a separate bin, and hopefully the possibility to add these bins together to reduce error bars

Errors: N/A

Data: It's basically a standard distribution over 8192 lines. Full data for 1 file is here. Also the original .Spe file, the scatter plot and the full version of the code

2020-11-23 Update From Answer Comments:

Hi, I've been trying to implement this for a little while and come unstuck. I've tried to follow your example closely, however I am getting out histogram still with a bin width of 1 (i.e. not adding together). I also get a second blank graph in the printouts, and the reports only printout in the IDE (though i am working on that one, and reckon i will have it soon). Also for some reason, it seems to stop after 3 out of 50 iterations of the loop.
This is the code in it's current state:
This is the output I'm getting:

This is the original output:
And just in case it's useful, this is the raw data. I seem to be having trouble replicating your last 2 figure
The ideal would just be able to alter the constant on line 30, to whatever the desired bin width is, and have it run with that bin width on that occasion.

Trenton McKinney · Accepted Answer

The scatter plot, in this case, is a histogram, except with dots instead of bars.
.Spe is a bin count for each event.
x = np.linspace(0, 8191, 8192) defines the bins, and the bin width is 1.
Construct a bar plot instead of a scatter plot
- ax.bar(x, y) instead of ax.plot(x, y, 'bo')
As a result of the existing data, the following plot is a histogram with a very wide distribution.
- There are values ranging from 321 to 1585
- ax.set_xlim(300, 1800)

The benefit of this data is, it's easy to recreate the raw distribution based on x, a bin size of 1, and y being the respective count for each x.
np.repeat can create an array with repeat elements

import numpy
import matplotlib.pyplot as plt

# given x and y from the loop
# set the type as int
y = y.astype(int)
x = x.astype(int)

# create the data
data = np.repeat(x, y)

# determine the range of x
x_range = range(min(data), max(data)+1)

# determine the length of x
x_len = len(x_range)

# plot
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(10, 10))

ax1.hist(data, bins=x_len)  # outliers are not plotted
ax2.boxplot(data, vert=False)
plt.show()

Given data, you can now perform whatever analysis is required.
SO: Fit gaussian to noisy data with lmfit
LMFIT Docs
Cross Validated might be a better site for diving into the model
All of the error calculation parameters come from the model result. If you calculate new x and y from np.histogram for different bin widths, that may affect the errors.
- approxcen = 780 is also an input to result

# given x_len determine how many bins for a given bin width
width = 8
bins = int(np.round(x_len / width))

# determine new x and y for the histogram
y, x = np.histogram(data, bins=bins)

# Fit data to Gaussian
gmodel = Model(gaussian)
result = gmodel.fit(y, x=x[:-1], amp=8, cen=approxcen, wid=1)

# result
print(result.fit_report())

[out]:
[[Model]]
    Model(gaussian)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 314
    # data points      = 158
    # variables        = 3
    chi-square         = 397.702574
    reduced chi-square = 2.56582306
    Akaike info crit   = 151.851284
    Bayesian info crit = 161.039069
[[Variables]]
    amp:  1174.80608 +/- 37.1663147 (3.16%) (init = 8)
    cen:  775.535731 +/- 0.46232727 (0.06%) (init = 780)
    wid:  12.6563219 +/- 0.46232727 (3.65%) (init = 1)
[[Correlations]] (unreported correlations are < 0.100)
    C(amp, wid) =  0.577

# plot
plt.figure(figsize=(10, 6))
plt.bar(x[:-1], y)
plt.plot(x[:-1], result.best_fit, 'r-', label='best fit')

plt.figure(figsize=(20, 8))
plt.bar(x[:-1], y)
plt.xlim(700, 850)
plt.plot(x[:-1], result.best_fit, 'r-', label='best fit')
plt.grid()

As we can see from the next code block, the error is related to the following parameters
- stderrThreshold = 10
- minimumAmplitude = 0.1
- MaestroT = 53

    ## Append to list if error in amplitude and amplitude itself is within reasonable bounds
    if result.params['amp'].stderr < stderrThreshold and result.params['amp'] > minimumAmplitude:
        amps.append(result.params['amp'].value) 
        ampserr.append(result.params['amp'].stderr) 
        ts.append(MaestroT*n)

Turning a scatter plot into a histogram in python

2020-11-23 Update From Answer Comments:

Answers (1)

Related Questions