user3029412
user3029412

Reputation: 119

Simple line plot in python is rounding values to integers. Why?

I'm trying to add python to my repertoire (R is my program of choice) and am having an issue with a simple line plot.

While the generated array (in this case, y) is of float type (which I want), when I plot a simple line plot using matplotlib, that same y is no truncated to the nearest whole integer.

Any help would be appreciated.

Thanks. Here's sample code. P.S. Any hints as to cleaning up the code would also be more than welcome.

import sys
import numpy as np
from numpy import random
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as matplotlib
plt.style.use('ggplot')

greens = np.array([0,0])
others = np.array(np.arange(1,37))

# no axis provided, array elements will be flattened
roulette = np.append(greens, others)

spins1000 = np.array(random.choice(roulette, size=(1000)))

# Create function for cum mean in python

def cum_mean(arr):
    cum_sum = np.cumsum(arr, axis=0)    
    for i in range(cum_sum.shape[0]):       
        if i == 0:
            continue        
        print(cum_sum[i] / (i + 1))
        cum_sum[i] =  cum_sum[i] / (i + 1)
    return cum_sum

y = np.array(cum_mean(spins1000))
x = np.array(np.arange(1,1001))

fig, ax = plt.subplots(figsize=(10, 6))
ax.set(xlim=(0, 1000), ylim=(10.00, 25.00))
line = ax.plot(x, y, color='red', lw=1)[0]
plt.draw()
plt.show()

Upvotes: 0

Views: 869

Answers (1)

JohanC
JohanC

Reputation: 80289

There are two things happening, which in combination cause the strange behavior.

  • cum_sum = np.cumsum(arr, axis=0) with arr being an array of integers, make cum_sum also an array of integers
  • in the loop, writing cum_sum[i] = cum_sum[i] / (i + 1) stores the result (which is a float) into an integer array; this storing rounds the number

A solution would either be to create cum_sum as float (as in cum_sum = np.cumsum(arr, dtype=float)). Or to do things "the numpy way", and create a new array in one go: return cum_sum / np.arange(1, cum_sum.shape[0] + 1). Note that numpy's array operations are vectorized, so dividing an array by an array gets the same result as dividing element by element. This runs quite faster (similar to what happens in R).

Also, if you would write cum_sum = cum_sum / np.arange(1, 1001), cum_sum would be a new float array. Only by accessing it element-by-element, the array stays an array of integers. Note that np.arange() already creates a numpy array, so calling np.array again doesn't change it.

import matplotlib.pyplot as plt
import numpy as np
plt.style.use('ggplot')

greens = np.array([0, 0])
others = np.arange(1, 37)

# no axis provided, array elements will be flattened
roulette = np.append(greens, others)

spins1000 = np.array(np.random.choice(roulette, size=(1000)))

# Create function for cum mean in python
def cum_mean(arr):
    cum_sum = np.cumsum(arr)
    return cum_sum / (np.arange(1, cum_sum.shape[0] + 1))

y = cum_mean(spins1000)
x = np.arange(1, 1001)

fig, ax = plt.subplots(figsize=(10, 6))
ax.set(xlim=(0, 1000), ylim=(10.00, 25.00))
line = ax.plot(x, y, color='red', lw=1)[0]
plt.show()

resulting plot

Upvotes: 1

Related Questions