smatthewenglish
smatthewenglish

Reputation: 2889

data visualization. 3D, precison, recall, and f-measure. maybe using ocatve?

I've been running a machine learning algorithm, I have output in the form of Precision, Recall, and F-Measure.

I'd like to graph this data so I can get a clearer conception of how things are really going, but I don't really know how to do that. I suppose I can use Octave? I heard about it in that Andrew Ng course and I've already got it on my machine, but I don't really know how to use it to visualize data.

Does anyone with experience in this know how I might best proceed or some helpful resources on the best way to go about this?

0.011723329425556858 P 0.6000000238418579 R 0.010416666977107525 F1 0.02047781631341665
0.012895662368112544 P 0.6363636255264282 R 0.01215277798473835 F1 0.023850085569817648
0.01406799531066823 P 0.6666666865348816 R 0.013888888992369175 F1 0.027210884568890845
0.015240328253223915 P 0.6153846383094788 R 0.013888888992369175 F1 0.02716468612858015
0.016412661195779603 P 0.6428571343421936 R 0.015625 F1 0.03050847456668239
0.017584994138335287 P 0.6000000238418579 R 0.015625 F1 0.03045685282259509
0.01875732708089097 P 0.5625 R 0.015625 F1 0.030405405405405407
0.01992966002344666 P 0.529411792755127 R 0.015625 F1 0.030354131580674088
0.021101992966002344 P 0.5555555820465088 R 0.0173611119389534 F1 0.03367003527554599
0.022274325908558032 P 0.5263158082962036 R 0.0173611119389534 F1 0.03361344696816966
0.023446658851113716 P 0.5 R 0.0173611119389534 F1 0.033557048526295
0.0246189917936694 P 0.4761904776096344 R 0.0173611119389534 F1 0.03350083906570289

Upvotes: 0

Views: 76

Answers (1)

greeness
greeness

Reputation: 16114

I suppose the first column is some threshold you varied between lines. The precision-recall graph is precision-vs-recall. Thus we can first retrieve those two columns from your data: (suppose your data are saved in prf.data).

cat prf.data | awk '{print $3,$5}'

You will get below two columns only and you can initialize a 2d matrix in octave:

data = [
0.6000000238418579 0.010416666977107525
0.6363636255264282 0.01215277798473835
0.6666666865348816 0.013888888992369175
0.6153846383094788 0.013888888992369175
0.6428571343421936 0.015625
0.6000000238418579 0.015625
0.5625 0.015625
0.529411792755127 0.015625
0.5555555820465088 0.0173611119389534
0.5263158082962036 0.0173611119389534
0.5 0.0173611119389534
0.4761904776096344 0.0173611119389534];

Then under octave, below command will print each row as a data point in the graph:

plot(data(:,2), data(:,1), 'x')
ylabel('precision')
xlabel('recall')

enter image description here

Looks like with some threshold increase, you are decreasing precision and the recall stays the same (for example, when threshold = 0.021, 0.022, 0.023, 0.024).

Upvotes: 1

Related Questions