postgres
postgres

Reputation: 2262

SVM: Different scores between grid search results and mine, an entire class absence and wrong prediction. Why?

I have 155 images and 8 classes Provided that the features are not scaled in range [0-1].

grid search-cross validation suggest me linear kernel and C = 1000 with this scores:

         precision    recall  f1-score   support

      1       0.54      0.88      0.67         8
      2       0.73      1.00      0.84         8
      3       1.00      1.00      1.00         6
      4       0.75      0.50      0.60        12
      5       0.83      0.83      0.83         6
      6       0.92      0.65      0.76        17
      7       0.71      0.42      0.53        12
      8       0.60      1.00      0.75         9

avg / total       0.77      0.73      0.72        78 

but when i try linear kernel and C=1000 i obtain:

         precision    recall  f1-score   support

      1       0.00      0.00      0.00         0
      2       1.00      0.70      0.82        10
      3       1.00      1.00      1.00        13
      4       0.73      0.58      0.65        19
      5       1.00      0.95      0.97        19
      6       0.96      0.88      0.92        25
      7       0.82      0.67      0.73        27
      8       0.70      1.00      0.82        16

avg / total       0.88      0.81      0.84       129


Confusion matrix:
[[ 0  0  0  0  0  0  0  0]
[ 0  7  0  0  0  0  3  0]
[ 0  0 13  0  0  0  0  0]
[ 2  0  0 11  0  1  0  5]
[ 0  0  0  1 18  0  0  0]
[ 0  0  0  0  0 22  1  2]
[ 6  0  0  3  0  0 18  0]
[ 0  0  0  0  0  0  0 16]]

Why class 1 have all zeros?

I saw also that with rbf kernel i have best results, but always zeros in first class:

         precision    recall  f1-score   support

      1       0.00      0.00      0.00         0
      2       1.00      1.00      1.00        10
      3       1.00      1.00      1.00        13
      4       0.94      0.89      0.92        19
      5       1.00      0.95      0.97        19
      6       0.93      1.00      0.96        25
      7       1.00      0.78      0.88        27
      8       1.00      1.00      1.00        16

avg / total       0.98      0.93      0.95       129


Confusion matrix:
[[ 0  0  0  0  0  0  0  0]
 [ 0 10  0  0  0  0  0  0]
 [ 0  0 13  0  0  0  0  0]
 [ 1  0  0 17  0  1  0  0]
 [ 0  0  0  1 18  0  0  0]
 [ 0  0  0  0  0 25  0  0]
 [ 5  0  0  0  0  1 21  0]
 [ 0  0  0  0  0  0  0 16]]

at the end when i try to predict some same images of the training set

print(clf.predict(fv))

where fv is an image feature vector:

[0.16666666666628771, 5.169878828456423e-26, 2.3475644278196356e-21, 1.0, 1.0000000000027285]

and assigns to the feature vector a wrong class! (i.e. image owns 4 class but predict() outcome is 5 class)

RE-UPDATE

image set: https://docs.google.com/file/d/0ByS6Z5WRz-h2V3RkejFkb21Fb0E/edit?usp=sharing

features imaage set: https://docs.google.com/file/d/0ByS6Z5WRz-h2YlhuUmFBaElXVEE/edit?usp=sharing

FULL CODE:

import os
import glob
import numpy as np
from numpy import array
import cv2

target = [      1,1,1,1,
          1,1,1,1,1,1,1,
          1,1,1,1,1,1,1,
          1,2,2,2,2,2,2,
          2,2,2,2,2,2,2,
          2,2,2,2,3,3,3,
          3,3,3,3,3,3,3,
          3,3,3,4,4,4,4,
          4,4,4,4,4,4,4,
          4,4,4,4,4,4,4,        
          4,5,5,5,5,5,5,
          5,5,5,5,5,5,5,
          5,5,5,5,5,5,6,
          6,6,6,6,6,6,6,
          6,6,6,6,6,6,6,
          6,6,6,6,6,6,6,
          6,6,6,7,7,7,7,               
          7,7,7,7,7,7,7,
          7,7,7,7,7,7,7,
          7,7,7,7,7,7,7,                  
          7,7,8,8,8,8,8,
          8,8,8,8,8,8,8,
          8,8,8,8]



features = [ [0.26912666717306399, 0.012738398606387012, 0.011347858467581035, 0.1896938013442868, 2.444553429782046]
,
[0.36793086934925351, 0.034364344308391102, 0.019054536791551006, 0.0076875387476751395, 3.03091214703604]
,
[0.36793086934925351, 0.034364344308391102, 0.019054536791551006, 0.0076875387476751395, 3.03091214703604]
,
[0.30406240228443038, 0.047100329090555518, 0.0049653458889261448, 0.0004618404341300081, 5.987025009738751]
,
[0.36660353297714748, 0.034256126367653919, 0.01892501331178556, 0.007723901183105499, 3.0392760101225234]
,
[0.26708884220978957, 0.012126741224471632, 0.0063753119877062942, 0.0005937801528983894, 2.403113171408598]
,
[0.27070254516425241, 0.01293684867974746, 0.01159661796151442, 0.008380724334031727, 2.4492688425144986]
,
[0.27076540467770038, 0.012502407901054009, 0.011180048331833999, 0.0007116977225672878, 2.4068989750876266]
,
[0.22832314403919951, 0.010491475428909061, 0.0027317652016312271, 0.001417434443656981, 2.6271926274711968]
,
[0.22374814412737717, 0.0095258889624651646, 0.0040833924467236719, 0.1884906960716747, 2.5474055920602514]
,
[0.23860556210266026, 0.0067860933136106557, 0.0052050705189953389, 0.01498751040799334, 2.0545849084769694]
,
[0.32849751530034654, 0.0082079572128769367, 0.017950580842136479, 0.07211170619739862, 1.761646715256231]
,
[0.3536962871782694, 0.04335618127793292, 0.0084705562859388305, 0.003939815915497741, 3.8626463078353632]
,
[0.23642964900011443, 0.0060530993708264348, 0.0041172882106328976, 0.003276003276003276, 1.9809324414862304]
,
[0.35468301957048581, 0.043735489028639378, 0.0085420200506240735, 0.00041124057573680605, 3.873602628153773]
,
[0.35549112610207528, 0.043992218599656373, 0.0086354414147218166, 0.004276259969455286, 3.8781644572829106]
,
[0.97303451800669749, 0.075165987107118692, 0.23350656471824954, 0.04989418850724402, 1.7845923298199189]
,
[0.32292438991638828, 0.0078312712861588109, 0.018256154769458615, 0.05861489639723726, 1.754975905310628]
,
[0.36415716731096714, 0.033783635359516562, 0.0087048690616182353, 0.0007989674881691353, 3.0382507494699778]
,
[0.23247799686964493, 0.023970481957641395, 0.0020180739588722754, 0.2511737089201878, 4.987537342956105]
,
[0.25249755819322928, 0.03355835554037629, 0.0024745974458906918, 0.49168600154679043, 6.286228850887637]
,
[0.25524836990657951, 0.035216193154545015, 0.0023524820730296808, 0.49272798742138363, 6.553001816315555]
,
[0.25226043727172792, 0.033580607886770704, 0.002399474603048905, 0.4913428241631397, 6.310803986284148]
,
[0.2552359153348957, 0.034993472521483299, 0.0024465696242431606, 0.49311565696302123, 6.488164071764478]
,
[0.25249755819322928, 0.03355835554037629, 0.0024745974458906918, 0.49168600154679043, 6.286228850887637]
,
[0.19296658297366265, 0.0073667093687413854, 0.0010128002719554498, 0.20292887029288703, 2.6022382484976103]
,
[0.23130715659438109, 0.023652143308649062, 0.0020734509865596379, 0.2519981194170193, 4.96809084167716]
,
[0.23646940610897133, 0.025909457534721684, 0.0019634358569802723, 0.25097465886939574, 5.263654156113397]
,
[0.61892415483059771, 0.1855733578950316, 0.024118739298890277, 0.00010742003920831431, 5.579333799263049]
,
[0.61892415483059771, 0.1855733578950316, 0.024118739298890277, 0.00010742003920831431, 5.579333799263049]
,
[0.62187109165606835, 0.18810005977070685, 0.060143785970969831, 0.005752046658462197, 5.609811692923419]
,
[0.64410628333823972, 0.20178318336365086, 0.039546324622261202, 8.006565383614564e-05, 5.609490756132282]
,
[0.6214309265075304, 0.18779664186718673, 0.061337975720487534, 0.006350402281839464, 5.608301926807521]
,
[0.20135445416653119, 0.0070220507238874311, 0.0027092098815647042, 0.4125833006664053, 2.4256545571324732]
,
[0.20123494853445922, 0.0069845347246147793, 0.0027020357704780201, 0.4106724003127443, 2.420576584506546]
,
[0.2015816556223165, 0.0070631416111702362, 0.0025149608542164329, 0.4106073986851143, 2.4300340608128606]
,
[0.70115857527896985, 0.35625759453714789, 0.028386898853323388, 0.001234186979327368, 12.446918085552586]
,
[0.68366020888533297, 0.2387861974848598, 0.04047049559400958, 0.0725675987982436, 6.011803834536788]
,
[0.70115857527896985, 0.35625759453714789, 0.028386898853323388, 0.001234186979327368, 12.446918085552586]
,
[0.71378846605495283, 0.37185054375086962, 0.078338189105938844, 0.4899937460913071, 12.727628852581882]
,
[0.72219309919241148, 0.37567368174335658, 0.029371875736917675, 0.48066298342541436, 12.21840343375]
,
[0.84033907078880576, 0.29025638999406633, 0.090118665350957639, 0.00013319126265316994, 4.572824986179928]
,
[0.84033907078880576, 0.29025638999406633, 0.090118665350957639, 0.00013319126265316994, 4.572824986179928]
,
[0.84078478547550572, 0.28881268265635862, 0.092759120470064349, 0.0005334044539271903, 4.542932448095888]
,
[0.86195880470328134, 0.31149212664075476, 0.090341088591145105, 0.00044657097288676234, 4.673692966632184]
,
[0.85542893012496013, 0.29898764801731947, 0.17279563533793374, 0.0005314202205393915, 4.543371196521408]
,
[0.68653873117620423, 0.24135977292901584, 0.031609483792605572, 0.4553053169259345, 6.032229402405299]
,
[0.68937407444389065, 0.2429428175127194, 0.031783181019183315, 0.07118412046543464, 6.017180801429501]
,
[0.66262362984605561, 0.22830191525650573, 0.027222059698182095, 0.4712353884941554, 6.170703008647743]
,
[0.85191326598415906, 0.0066280315423251869, 0.18568977018064967, 0.24070082098793744, 1.211324246965761]
,
[0.41763663758743241, 0.0042550997098748248, 0.01052268995786553, 0.000998003992015968, 1.3702049090803978]
,
[0.47955540731641061, 0.036031336698149265, 0.0037552308556160824, 0.41911764705882354, 2.3102900509255964]
,
[0.28510645493450759, 0.017800467984914338, 0.0013560744373383752, 0.6212718064153067, 2.7591153064421485]
,
[0.28093855472961832, 0.017019535454492932, 0.0025233674347249074, 0.6243626062322947, 2.733908520445971]
,
[0.28510645493450759, 0.017800467984914338, 0.0013560744373383752, 0.6212718064153067, 2.7591153064421485]
,
[0.29957424000441979, 0.020997289413265056, 0.0032514165703168524, 0.002352941176470588, 2.8737257187232768]
,
[0.28093855472961832, 0.017019535454492932, 0.0025233674347249074, 0.6243626062322947, 2.733908520445971]
,
[0.94384505611284442, 0.0070361165614443756, 0.17778161251377933, 0.00013138014845956775, 1.1950816827585424]
,
[1.2480442396269933, 0.013169393067805945, 0.37414805554448649, 0.0018769272020378066, 1.202522486580245]
,
[0.82815785035628164, 0.0071847611802335776, 0.17226935935994725, 0.24680054800013365, 1.2280429227515923]
,
[0.55468014442636804, 0.04844726528488761, 0.074669093941655343, 0.3799483919692869, 2.3157520760049994]
,
[0.85603162865577076, 0.010190325204698992, 0.14635589096917062, 0.00018691588785046728, 1.2673797230628077]
,
[0.55881837183305305, 0.048068057730781634, 0.06639403930381195, 0.3722541921910773, 2.291289872230647]
,
[0.55650701031519434, 0.047379164870780005, 0.075834025272625227, 0.3768812839567851, 2.2847828255276856]
,
[0.59736941845983627, 0.054964632904472815, 0.089651232352172761, 0.0002190940461192967, 2.291980379225357]
,
[0.55468014442636804, 0.04844726528488761, 0.074669093941655343, 0.3799483919692869, 2.3157520760049994]
,
[0.37385965430511475, 0.019136318061858774, 0.0017515265254845647, 0.002456248081056187, 2.1746841721523915]
,
[0.3755068478409902, 0.019166948350188812, 0.0045621553498242356, 0.4868705591597158, 2.1680040687479902]
,
[0.376117657056177, 0.020048016077051325, 0.004081551918441755, 0.48440424204616345, 2.20746211913412]
,
[0.18567611209815035, 0.0017735326711233123, 0.00026719643703200545, 0.37649076434123163, 1.5866887090683386]
,
[0.15935887794419157, 3.0968737461516311e-05, 4.6106803792004044e-06, 7.109594397639615e-05, 1.0723690004464064]
,
[0.1598493732922015, 9.6513614204532248e-05, 1.4807540465080871e-05, 0.020011435105774727, 1.130966420539851]
,
[0.15976502679964721, 9.179670697435723e-05, 1.1098997372160861e-05, 0.027888446215139442, 1.127590980529105]
,
[0.15948519514589277, 8.8904788108173233e-05, 3.0493405326069049e-07, 0.825754804580883, 1.1256719774569757]
,
[0.16617638537179313, 0.0020240604885197228, 3.5948671354276501e-05, 0.00017182868679926113, 1.7424826840700272]
,
[0.16617882105231332, 0.002010285330985506, 3.1650697838912209e-05, 0.00017161489617298782, 1.7390017992958084]
,
[0.16601904246228144, 0.001959487143766989, 3.2733987503779933e-05, 0.10968404829180581, 1.7271461688896599]
,
[0.16628339469915165, 0.0020643314471593802, 1.4502279324313873e-05, 0.14276914653343373, 1.7519319117125625]
,
[0.16629298316796565, 0.0020800819965552542, 1.9020907349023509e-05, 0.13840607699240376, 1.755817053262183]
,
[0.18572210382333143, 0.0018178104959919194, 0.0002453722722107162, 6.292672183242613e-05, 1.5959450271122788]
,
[0.78164051870269824, 0.051523793666842309, 0.015067726988898911, 4.814636494944632e-05, 1.818489926889651]
,
[0.18566012446433577, 0.0017919804956179246, 0.00018368826559889194, 0.3746835841076679, 1.590696751465318]
,
[0.1593593872646801, 3.0965616570412022e-05, 4.7608077176119086e-06, 0.013757065159432655, 1.072364982247259]
,
[0.15935971192682988, 3.4228786893989237e-05, 2.8175989802780335e-06, 0.011385902663771647, 1.0762239433773122]
,
[0.1593758710624088, 3.1730097257658988e-05, 6.5545372607421827e-06, 0.19480358030830433, 1.0732774861268992]
,
[0.15935651884191823, 3.2075768916173883e-05, 2.6894443902692268e-06, 0.011169712144620248, 1.0736994974496823]
,
[0.1593593872646801, 3.0965616570412022e-05, 4.7608077176119086e-06, 0.013757065159432655, 1.072364982247259]
,
[0.72806364396184653, 0.080927033958709829, 0.082024727906757688, 0.0003304829181641674, 2.282620340759594]
,
[0.34064008340950969, 0.031713563937392303, 0.0223935905703848, 0.5525150905432595, 3.191021756804023]
,
[0.34161716425171257, 0.032414962195661444, 0.023399763826767502, 0.5634559735427863, 3.228573480379]
,
[0.33995795036914717, 0.032291160309302944, 0.014503695651070611, 0.5517519130084575, 3.2425659662137543]
,
[0.53755813910874839, 0.12514260672326116, 0.047097530510313457, 0.0022522522522522522, 4.849281676080233]
,
[0.53892887245870857, 0.12723100136939183, 0.047871070696486759, 0.0003630422944273008, 4.914680204854179]
,
[0.52941013268525083, 0.12033870626971493, 0.044950934295866135, 0.00036251586006887804, 4.801391369341545]
,
[0.5153795221866847, 0.11396653431855266, 0.046028411270117815, 0.0017374383209396067, 4.797613736965006]
,
[0.55889931613495802, 0.13776801275023373, 0.054206231614929122, 0.0003675794890645102, 4.954346523167349]
,
[0.53892887245870857, 0.12723100136939183, 0.047871070696486759, 0.0003630422944273008, 4.914680204854179]
,
[0.53876191407701801, 0.12675358533640296, 0.048092146277654686, 0.0003630422944273008, 4.896575690597256]
,
[0.64579700029686937, 0.053345962571719745, 0.047671705312373282, 0.00021581957483543757, 2.1135534993967275]
,
[0.52907834506993823, 0.11839951044942501, 0.046693278117526091, 0.001802451333813987, 4.720197357775248]
,
[0.62431811267333093, 0.16822847351832676, 0.078460359627903944, 0.0002954864445593558, 4.830349593161275]
,
[0.52957671831590236, 0.1206620716356978, 0.044424337085019652, 0.00036251586006887804, 4.812745400588476]
,
[0.64778861076667615, 0.011264454903514588, 0.26034582337509793, 0.00017355085039916696, 1.3918887090929497]
,
[0.64767923033014785, 0.011511416466409427, 0.26619423461723268, 0.0001713355606956224, 1.3970897837418754]
,
[0.64175254514795532, 0.051344373338613858, 0.047562712202626603, 0.0015838339705079192, 2.091594563276403]
,
[0.74328372556577627, 0.069102582620664751, 0.082952746646336797, 0.0001621665450417579, 2.094372254494601]
,
[0.63983023392719118, 0.050957609005336219, 0.04065234770126492, 0.0002180787264202377, 2.0902782497935077]
,
[0.64175254514795532, 0.051344373338613858, 0.047562712202626603, 0.0015838339705079192, 2.091594563276403]
,
[0.39929495902359424, 0.088487529110910193, 0.022225937358985204, 0.0016210739614994933, 6.842658946475011]
,
[0.40318161986196532, 0.091372930642081962, 0.029342259032521321, 0.0016383370878558263, 6.991543657993919]
,
[0.40286945787178563, 0.092489700223200605, 0.029477042699685527, 0.0008298755186721991, 7.159524821606994]
,
[0.401527045553835, 0.092940206887656154, 0.022384335964308343, 0.0008262755629002272, 7.307506331212089]
,
[0.48221520941584561, 0.080925707098030486, 0.01508266157389335, 0.016811768237766436, 3.877246216887803]
,
[0.23300739937344839, 0.0081726649803679097, 0.00070589920573164966, 0.7233009708737864, 2.267880404181219]
,
[0.4889793426754816, 0.13379642486830962, 0.0079207484968624713, 0.0012550988390335738, 6.938143452247703]
,
[0.50805268679046123, 0.15157146566770596, 0.002286367854475147, 0.0015261350629530714, 7.558054436128668]
,
[0.50504588069443601, 0.14372144884265609, 0.002332370870321935, 0.4888972525404592, 7.020435652999047]
,
[0.49053398407596349, 0.13596678236015974, 0.0068673835378752004, 0.5062523683213338, 7.054927383254023]
,
[0.27047698059047881, 0.02400759815979293, 0.0042725763257732184, 0.1406003159557662, 3.6822411994354223]
,
[0.67217292360607472, 0.21411359416298198, 0.038240138048085716, 0.00030014256771966684, 5.418493234141116]
,
[0.66809561834310183, 0.20843134771175456, 0.055569614057154701, 0.0005965697240865026, 5.316112363334643]
,
[0.69764902288163122, 0.23441611695166623, 0.040989861350760971, 0.00030097817908201655, 5.535854638867057]
,
[0.69337536416934831, 0.23122440548075349, 0.039932976305992858, 0.0011285832518245428, 5.5253522283788445]
,
[0.48053616103332131, 0.078827555080480394, 0.014699769292604886, 0.00040342914775592535, 3.810804845605404]
,
[0.51893243284454049, 0.14486098229876093, 0.007011404157031503, 0.0013995801259622112, 6.503015780005906]
,
[0.51611281879296478, 0.14397569681830566, 0.0063953861901166996, 0.0024067388688327317, 6.552602133840095]
,
[0.52265570318341037, 0.14786059553298658, 0.021856594872657918, 0.002438599547117227, 6.567632701584826]
,
[0.30079480228240624, 0.022512205511218238, 0.00042758792096778651, 0.016516516516516516, 2.990535008572801]
,
[0.30656959740479811, 0.025225633729599333, 0.00052074639660009423, 0.014692653673163419, 3.1500163953105362]
,
[0.36561931104389456, 0.034065616542602442, 0.00073193209081989026, 0.5295319844676067, 3.0388637406298646]
,
[0.30523253105219622, 0.024888851231432006, 0.00049965741600376489, 0.014692653673163419, 3.1395734571173244]
,
[0.30228106501925794, 0.02294279475480349, 0.00029015539686061685, 0.016315633343221597, 3.0087064225809246]
,
[0.48449572183350859, 0.08057148632400099, 0.014649379545360155, 0.0008072653884964682, 3.8293932305935345]
,
[0.48696620229608523, 0.082309882547938931, 0.015050994484143265, 0.0008004802881729037, 3.8679773897921153]
,
[0.28412339537248588, 0.026648939499942827, 0.0040253434951236042, 0.652089407191448, 3.7009800669447657]
,
[0.28496156479277329, 0.02656759057204762, 0.0040076364850396805, 0.6479146459747818, 3.672807600295908]
,
[0.27750673534987835, 0.024513847513161952, 0.0040536738369991365, 0.6610337972166997, 3.589249226795383]
,
[0.23076358836711391, 0.0081276558884353922, 0.0011346229787721842, 0.004830917874396135, 2.2823193871783753]
,
[0.23009954177415121, 0.0067688972295314211, 0.00050627342206410546, 0.7085308056872038, 2.1131083582556887]
,
[0.74667089537876641, 0.017808021782196932, 0.00058715813729321711, 0.20097746402389358, 1.4352297290097118]
,
[0.46459021914407012, 0.015923283050662724, 0.0096104956720461029, 0.07748745012228087, 1.7457829468097172]
,
[0.46915422400128481, 0.016432642899682152, 0.019842029469490614, 0.07962922414422305, 1.75192535048515]
,
[0.46603526212803831, 0.014906446836800192, 0.034027862564102791, 0.00017277871366247678, 1.709953995454109]
,
[0.74667089537876641, 0.017808021782196932, 0.00058715813729321711, 0.20097746402389358, 1.4352297290097118]
,
[0.74667089537876641, 0.017808021782196932, 0.00058715813729321711, 0.20097746402389358, 1.4352297290097118]
,
[0.79130699331490484, 0.018730652999112182, 0.0025843522081647448, 0.18977700753966478, 1.4182461597025509]
,
[0.78526444941147622, 0.019630664985282237, 0.0014735307445837577, 0.19151016964319956, 1.43434362458419]
,
[1.2360091274851985, 0.11319166323186233, 0.037129035449204553, 0.13274704929414488, 1.7480015935515811]
,
[1.2379748172284306, 0.11372770880048684, 0.03880647583352842, 0.1327272446632812, 1.748796779378963]
,
[1.0687065973690613, 0.06124884507730273, 1.0261941877753638, 0.0006237784339002786, 1.6027241684534652]
,
[1.0719786963564104, 0.066016997091209076, 0.67377325492164508, 0.000951317367746205, 1.6304903448777237]
,
[1.0726105544893461, 0.060421845064782209, 0.68185690192755832, 0.0016887717274899085, 1.5946007712754786]
,
[0.46459021914407012, 0.015923283050662724, 0.0096104956720461029, 0.07748745012228087, 1.7457829468097172]
,
[0.46578048886757484, 0.014968271641939066, 0.034797974069617273, 0.009791711402190251, 1.7124764457384176]
,
[0.47125303755595782, 0.015662651510987502, 0.0092152255656893708, 0.00017202081451855674, 1.723199095190638]
             ]




############################ PREDICTION TEST 1 IMAGE ################
print("TRY IMAGE")
import numpy as np
from sklearn import svm, metrics
X = features
y = target
from sklearn.svm import SVC
C = 1000.0
clf = svm.SVC(kernel='rbf', C=C).fit(X, y)
#svm.SVC(kernel='linear', C=C).fit(X, y) #SVC()
#clf.fit(X, y)
print("predizione")

#fv is class 8 but show me 5
fv = [0.16666666666628771, 5.169878828456423e-26, 2.584939414228212e-22, 1.0, 1.0000000000027285]
print(fv)
print(clf.predict([fv]))




############### METRICS ##########


# We learn the digits on the first half of the digits


# Now predict the value of the digit on the second half:
import matplotlib.pyplot as plt

expected = y[26:]
predicted = clf.predict(X[26:])
print("expected")
print(len(expected))
print("predicted")
print(len(predicted))

print "Classification report for classifier %s:\n%s\n" % (
    clf, metrics.classification_report(expected, predicted))
print "Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted)

Upvotes: 2

Views: 1632

Answers (1)

ogrisel
ogrisel

Reputation: 40169

You train a model on the full dataset and then compute the score on a subset of the training set, namely all the end of the dataset except the 26 first samples which includes the whole set of samples from class 0.

You cannot evaluate the model this way: you need to randomly shuffle the data and then split the training and test set before training the model (otherwise the whole dataset is the training set and you have no separate test set). If you do:

import numpy as np
from sklearn import svm, metrics
from sklearn.cross_validation import train_test_split
from sklearn.svm import SVC

X = features
y = target

X_train, X_test, y_train, y_test = train_test_split(X, y,
        test_size=0.25, random_state=42)

C = 1000.0
clf = svm.SVC(kernel='rbf', C=C).fit(X_train, y_train)
y_predicted = clf.predict(X_test)

print "Classification report for classifier %s:\n%s\n" % (
    clf, metrics.classification_report(y_test, y_predicted))
print "Confusion matrix:\n%s" % metrics.confusion_matrix(y_test, y_predicted)

print "Predicting on 1 sample"
print "Input features:"
fv = [0.16666666666628771, 5.169878828456423e-26, 2.584939414228212e-22, 1.0, 1.0000000000027285]
print fv
print "Predicted class index:"
print clf.predict([fv])

You will get the following output:

Classification report for classifier SVC(C=1000.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,
  gamma=0.0, kernel=rbf, max_iter=-1, probability=False, shrinking=True,
  tol=0.001, verbose=False):
             precision    recall  f1-score   support

          1       0.50      0.25      0.33         4
          2       0.75      1.00      0.86         6
          3       1.00      1.00      1.00         2
          4       0.75      1.00      0.86         3
          5       1.00      0.88      0.93         8
          6       1.00      1.00      1.00         5
          7       0.75      0.75      0.75         8
          8       1.00      1.00      1.00         3

avg / total       0.84      0.85      0.83        39


Confusion matrix:
[[1 1 0 0 0 0 2 0]
 [0 6 0 0 0 0 0 0]
 [0 0 2 0 0 0 0 0]
 [0 0 0 3 0 0 0 0]
 [0 0 0 1 7 0 0 0]
 [0 0 0 0 0 5 0 0]
 [1 1 0 0 0 0 6 0]
 [0 0 0 0 0 0 0 3]]
Predicting on 1 sample
Input features:
[0.1666666666662877, 5.169878828456423e-26, 2.584939414228212e-22, 1.0, 1.0000000000027285]
Predicted class index:
[5]

Of course this is a single random train / test split and as your dataset is very small the estimate of the score you get is subject to a high variance. You can compute an estimate of the expected mean score of this model class and parameter set by iterated cross validation:

from sklearn.cross_validation import ShuffleSplit
from sklearn.cross_validation import cross_val_score
from scipy.stats import sem

params = dict(kernel='rbf', C=1000)
clf = svm.SVC(**params)
cv = ShuffleSplit(X.shape[0], n_iter=50)
cv_scores = cross_val_score(clf, X, y, cv=cv)

Which will output:

print "Cross Validated test scores for SVC with params {0} on full dataset:".format(params)
print "Mean: {0:.3} +/-{1:.3}".format(np.mean(cv_scores), sem(cv_scores))
print "Standard deviation: {0:.3}".format(np.std(cv_scores))

Cross Validated test scores for SVC with params {'kernel': 'rbf', 'C': 1000} on full dataset:
Mean: 0.834 +/-0.0125
Standard deviation: 0.0872

So you can reasonably expect to have 83% predictive accurracy in general (or a bit higher as the CV procedure is underestimating a bit).

My first advice if you want to significantly improve upon this level of performance would be to collect more labeled samples to get a larger dataset.

The second advice would be to generate more labeled data out of the existing ones by applying small perturbations to the original images (e.g. small translations, rotations and a bit of uniform random noise) so as to generate more labeled out of the existing one by extracting the features of those additional samples.

Edit: for complementary questions:

I also left out 8/10 image samples because I think they do not belong to any class.

You should probably add an additional category named "other" for all images that don't belong to the other previous classes.

I should add a new class for each one and create new samples by small translations rotations?

No the goal is to improve the classification accuracies for the existing classes by adding more samples per class by constructing new samples out of the existing ones.

i got this error: TypeError: init() got an unexpected keyword argument 'n_iter' at this line cv = ShuffleSplit(X.shape[0], n_iter=50)

n_iter is the new name in 0.13 release. In 0.12 it was n_iterations:

http://scikit-learn.org/0.12/modules/generated/sklearn.cross_validation.ShuffleSplit.html

Upvotes: 3

Related Questions