batuman
batuman

Reputation: 7304

How to understand face detection xml

I have trained faces using opencv_trainedcascade.exe. I have a series of xml files for different stages. For each xml file has internal nodes and leafVlaues and one of them is shown below.

<?xml version="1.0"?>
<opencv_storage>
<stage0>
  <maxWeakCount>3</maxWeakCount>
  <stageThreshold>-1.3019366264343262e+000</stageThreshold>
  <weakClassifiers>
    <_>
      <internalNodes>
        0 -1 2711 -2099201 -2623493 -774797061 -2162625 -827343685
        -5535541 -1163949377 -21761</internalNodes>
      <leafValues>
        -9.2679738998413086e-001 6.0445684194564819e-001</leafValues></_>
    <_>
      <internalNodes>
        0 -1 1533 -252379683 -203697739 1410462197 1435881947 -74449473
        -1147414357 1510080511 -1</internalNodes>
      <leafValues>
        -9.1606438159942627e-001 6.2200444936752319e-001</leafValues></_>
    <_>
      <internalNodes>
        0 -1 917 -42468780 -11479728 -745548289 -2371181 -23070497
        -552607093 -74777633 -536871937</internalNodes>
      <leafValues>
        -9.2716777324676514e-001 5.4092508554458618e-001</leafValues></_></weakClassifiers></stage0>
</opencv_storage>

My queries are (1)What do those stageThreshold, internalNodes and leafValues mean? (2)In actual face detection, how are they used in cascaded classifier, I read a few paper for Adaboost algorithm. But I don't understand quite well. Thanks

Upvotes: 3

Views: 2260

Answers (2)

Tony
Tony

Reputation: 11

The Values are indeed quite cryptic. Thanks @halfer, I understood them a lot more. His reply already got most things correctly, but there are some things missing/not quite corret:

In the stage file, it is the "absolute" index of all lbp features. In the Cascade file, a feature map is appended at the end, describing the rectangles of the features. Each rectangle actually describes a 3x3 grid with 16 intersection points in total. Lbp features are calculated with the grayvalues from those 16 points.

The first two values are always 0 and -1 when training with depths = 1. They correspond to the child of the node. A value > 0 means, it is an internal node and the value is the index of the next node. A value < 1 means it is a leaf and the value is the negative index found in leaf values. (This can be found by examining the CvCascadeBoostTree::write method in the boost.cpp file in the opencv implementation)

Leaf values are weights of each tree calculated in the boosting algorithm. Usually Gentile Adaboost is used.

The most interesting thing are the remaining 8 values in internalNodes. Together the 8 32 bit numbers form an bitmap with 256 entries! So the lbp features are treated as categorical data with 256 possible values are every tree has a bitmap deciding whether a feature is active or not. This makes the decision stumps far more powerful since the decision bounderies are not linear.

Upvotes: 1

batuman
batuman

Reputation: 7304

After digging the detection_based_tracker.cpp, now I understood what are the internalNodes, leafValues, and stagethreshold and how they are used. When we look at the lbpcascade_frontalface.xml, we see a list of rectangles. These are the rectangles of trained face images (i.e. these areas have distinct features and can be used to differentiate faces from non-face images). For lbpcascade_frontalface.xml has 139 rectangles.

Each rectangle's x,y points are multiplied with a constant number to make additional three rectangels, so one rectangle represents four rectangles actually.

Then I will explain what is internalNode:

<internalNodes>
            0 -1 13 -163512766 -769593758 -10027009 -262145 -514457854
            -193593353 -524289 -1</internalNodes>

The first two numbers 0 -1 represents left and right. I think they represent left leafValue and right leafValue. The third one is the feature index. If we put those 139 rectangles into an array, that feature index refers to the array index. That means which rectangle to represent. The last eight numbers represents corner point subtractions from four rectangles. These are calculated from the integral images, so the numbers are quite big.

But I am not quite sure how leafValues are computed, but summation of these leafValues are compared with stageThreshold to make decision of face or non-face.

That is what I understood from debugging the code.

Upvotes: 6

Related Questions