Epideme
Epideme

Reputation: 211

Turning a scatter plot into a histogram in python

I need to plot a histogram from some data in another file.

At the moment I have code that plots a scatter plot and fits a Gaussian.

The x-value is whatever the number is on the corresponding line from the data file it's reading in (after the first 12 lines of other information, i.e. Line 13 is the first event), and the y-value is the number of the line multiplied by a value.

Then plots and fits the scatter, but I need to be able to plot it as a histogram, and be able to change the bin width / number (i.e. add bin 1, 2, 3 and 4 together to have 1/4 of the bins overall with 4 times the numbers of events - so I guess adding together multiple lines from the data), which is where I am stuck.

How would I go about making this into the histogram and adjust the width / numbers?

Code below, didn't know how to make it pretty. Let me know if I can make it a bit easier to read.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from numpy import exp, loadtxt, pi, sqrt, random, linspace
from lmfit import Model
import glob, os

## Define gaussian
def gaussian(x, amp, cen, wid):
    """1-d gaussian: gaussian(x, amp, cen, wid)"""
    return (amp / (sqrt(2*pi) * wid)) * exp(-(x-cen)**2 / (2*wid**2))

## Define constants
stderrThreshold = 10
minimumAmplitude = 0.1
approxcen = 780
MaestroT = 53

## Define paramaters
amps = []; ampserr = []; ts = []
folderToAnalyze = baseFolder + fileToRun + '\\'

## Generate the time array

for n in range(0, numfiles):
    
    ## Load text file
    x = np.linspace(0, 8191, 8192) 
    fullprefix = folderToAnalyze + prefix + str(n).zfill(3)
    y = loadtxt(fullprefix + ".Spe", skiprows= 12, max_rows = 8192) 

    ## Make figure
    fig, ax = plt.subplots(figsize=(15,8))
    fig.suptitle('Coincidence Detections', fontsize=20)
    plt.xlabel('Bins', fontsize=14)
    plt.ylabel('Counts', fontsize=14)

    ## Plot data
    ax.plot(x, y, 'bo')
    ax.set_xlim(600,1000)

    ## Fit data to Gaussian
    gmodel = Model(gaussian)
    result = gmodel.fit(y, x=x, amp=8, cen=approxcen, wid=1)

    ## Plot results and save figure
    ax.plot(x, result.best_fit, 'r-', label='best fit')
    ax.legend(loc='best')
    texttoplot = result.fit_report()
    ax.text(0.02, 0.5, texttoplot, transform=ax.transAxes)
    plt.close()
    fig.savefig(fullprefix + ".png", pad_inches='0.5')    

Current Output: Scatter plots, that do show the expected distribution and plot of the data (they do however have a crappy reduced chi^2, but one problem at a time)

Expected Output: Histogram plots of the same data, with the same distribution and fitting, when each event is plotted as a separate bin, and hopefully the possibility to add these bins together to reduce error bars

Errors: N/A

Data: It's basically a standard distribution over 8192 lines. Full data for 1 file is here. Also the original .Spe file, the scatter plot and the full version of the code

2020-11-23 Update From Answer Comments:

enter image description here

Upvotes: 1

Views: 3336

Answers (1)

Trenton McKinney
Trenton McKinney

Reputation: 62403

  • The scatter plot, in this case, is a histogram, except with dots instead of bars.
  • .Spe is a bin count for each event.
  • x = np.linspace(0, 8191, 8192) defines the bins, and the bin width is 1.
  • Construct a bar plot instead of a scatter plot
    • ax.bar(x, y) instead of ax.plot(x, y, 'bo')
  • As a result of the existing data, the following plot is a histogram with a very wide distribution.
    • There are values ranging from 321 to 1585
    • ax.set_xlim(300, 1800)

enter image description here

  • The benefit of this data is, it's easy to recreate the raw distribution based on x, a bin size of 1, and y being the respective count for each x.
  • np.repeat can create an array with repeat elements
import numpy
import matplotlib.pyplot as plt

# given x and y from the loop
# set the type as int
y = y.astype(int)
x = x.astype(int)

# create the data
data = np.repeat(x, y)

# determine the range of x
x_range = range(min(data), max(data)+1)

# determine the length of x
x_len = len(x_range)

# plot
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(10, 10))

ax1.hist(data, bins=x_len)  # outliers are not plotted
ax2.boxplot(data, vert=False)
plt.show()

enter image description here

  • Given data, you can now perform whatever analysis is required.
  • SO: Fit gaussian to noisy data with lmfit
  • LMFIT Docs
  • Cross Validated might be a better site for diving into the model
  • All of the error calculation parameters come from the model result. If you calculate new x and y from np.histogram for different bin widths, that may affect the errors.
    • approxcen = 780 is also an input to result
# given x_len determine how many bins for a given bin width
width = 8
bins = int(np.round(x_len / width))

# determine new x and y for the histogram
y, x = np.histogram(data, bins=bins)

# Fit data to Gaussian
gmodel = Model(gaussian)
result = gmodel.fit(y, x=x[:-1], amp=8, cen=approxcen, wid=1)

# result
print(result.fit_report())

[out]:
[[Model]]
    Model(gaussian)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 314
    # data points      = 158
    # variables        = 3
    chi-square         = 397.702574
    reduced chi-square = 2.56582306
    Akaike info crit   = 151.851284
    Bayesian info crit = 161.039069
[[Variables]]
    amp:  1174.80608 +/- 37.1663147 (3.16%) (init = 8)
    cen:  775.535731 +/- 0.46232727 (0.06%) (init = 780)
    wid:  12.6563219 +/- 0.46232727 (3.65%) (init = 1)
[[Correlations]] (unreported correlations are < 0.100)
    C(amp, wid) =  0.577

# plot
plt.figure(figsize=(10, 6))
plt.bar(x[:-1], y)
plt.plot(x[:-1], result.best_fit, 'r-', label='best fit')

enter image description here

plt.figure(figsize=(20, 8))
plt.bar(x[:-1], y)
plt.xlim(700, 850)
plt.plot(x[:-1], result.best_fit, 'r-', label='best fit')
plt.grid()

enter image description here

  • As we can see from the next code block, the error is related to the following parameters
    • stderrThreshold = 10
    • minimumAmplitude = 0.1
    • MaestroT = 53
    ## Append to list if error in amplitude and amplitude itself is within reasonable bounds
    if result.params['amp'].stderr < stderrThreshold and result.params['amp'] > minimumAmplitude:
        amps.append(result.params['amp'].value) 
        ampserr.append(result.params['amp'].stderr) 
        ts.append(MaestroT*n)

Upvotes: 3

Related Questions