Borealis
Borealis

Reputation: 8480

How to compare predicted vs real frequency data?

I have a machine learning approach that counts cars in jpeg images. For each image, I have a predicted count of the number of cars from the machine learning approach and a real count of the number of cars based on a human's count. This is what the dataset looks like:

             predicted_cars   real_cars
Image_1      2                1
Image_2      6                7
Image_3      0                0
Image_4      0                1
Image_5      0                0
Image_6      1                1
...
Image_5000   4                3

My initial thought would be to use a linear regression, although since this dataset has discrete count data I assume that would be inappropriate. Additionally, since the majority of counts will likely be 0, this is likely to influence the statistics.

What approach can I take to statistically and/or graphically assess how well the predicted car counts compare to the "real" car counts? I am working in Python with scikit-learn and pandas.

Upvotes: 2

Views: 636

Answers (2)

yatu
yatu

Reputation: 88266

Calculating the accuracy of the result here is quite trivial, you could take the mean absolute error or the mean squared error for instance. You can find a wide variety of error metrics in sklearn.metrics.

And for a visual representation of the results, one way would be to plot a stacked bar chart:

df.plot(kind='bar', stacked=True)

enter image description here

Upvotes: 1

Related Questions