Reputation: 8480
I have a machine learning approach that counts cars in jpeg images. For each image, I have a predicted count of the number of cars from the machine learning approach and a real count of the number of cars based on a human's count. This is what the dataset looks like:
predicted_cars real_cars
Image_1 2 1
Image_2 6 7
Image_3 0 0
Image_4 0 1
Image_5 0 0
Image_6 1 1
...
Image_5000 4 3
My initial thought would be to use a linear regression, although since this dataset has discrete count data I assume that would be inappropriate. Additionally, since the majority of counts will likely be 0, this is likely to influence the statistics.
What approach can I take to statistically and/or graphically assess how well the predicted car counts compare to the "real" car counts? I am working in Python with scikit-learn and pandas.
Upvotes: 2
Views: 636
Reputation: 88266
Calculating the accuracy of the result here is quite trivial, you could take the mean absolute error or the mean squared error for instance. You can find a wide variety of error metrics in sklearn.metrics
.
And for a visual representation of the results, one way would be to plot a stacked bar chart:
df.plot(kind='bar', stacked=True)
Upvotes: 1
Reputation: 1265
For accuracy, pick a score metric eg: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error
Upvotes: 0