Antimony
Antimony

Reputation: 2240

TypeError when plotting histogram with Matplotlib

I'm trying to plot a histogram of a file of float numbers. The contents of the file look like this:

0.1066770707640915
0.0355590235880305
0.0711180471760610
0.4267082830563660
0.0355590235880305
0.1066770707640915
0.0698755355867468
0.0355590235880305
0.0355590235880305
0.0355590235880305
0.0355590235880305
0.0355590235880305
0.2844721887042440
0.0711180471760610
0.0711180471760610
0.0355590235880305
0.0355590235880305
0.1422360943521220
0.0355590235880305
0.0355590235880305
0.0711180471760610
0.0355590235880305
0.0355590235880305
0.0355590235880305
...

For some reason, my attempt is throwing me a TypeError: len() of unsized object.

import matplotlib.pyplot as plt

input_file = "inputfile.csv"
file = open(input_file, "r")
all_lines = list(file.readlines())
file.close()

for line in all_lines:
    line = float(line.strip()) # Removing the '\n' at the end and converting to float
    if not isinstance(line, float): # Verifying that all data points could be converted to float
        print type(line)

print len(all_lines)
# 146445

print type(all_lines)
# <type 'list'>

plt.hist(all_lines, bins = 10) # This line throws the error
plt.show()

I have scoured SO looking for similar problems. It appears that this error is common when trying to plot non-numeric data types, but this is not the case here, since I explicitly check the data type of each number to ensure that they are not a strange data type.

Is there something obvious that I am missing?

Upvotes: 0

Views: 1424

Answers (1)

tmdavison
tmdavison

Reputation: 69116

You loop does not actually convert the items of all_lines to floats in place; it just takes each item, converts it to a float and prints it, but it does not change the value in the list. So, when you come to plot all_lines, the lines are still stored as strings.

You could instead change all values in the list to floats using a list comprehension as follows:

all_lines = [float(line) for line in all_lines]

Even better might be to just read the file using numpy, and then you will have the lines stored as floats in a numpy array, and save yourself the trouble of iterating through the lines of the file:

import numpy as np
import matplotlib.pyplot as plt

input_file = "inputfile.csv"
all_lines = np.genfromtxt(input_file)

plt.hist(all_lines, bins = 10)
plt.show()

Upvotes: 1

Related Questions