Jame
Jame

Reputation: 3854

How to speed up caffe classifer in python

I am using python to use caffe classifier. I got image from my camera and peform predict image from training set. It work well but the problem is speed very slow. I thinks just 4 frames/second. Could you suggest to me some way to improve computational time in my code? The problem can be explained as following. I have to reload an network model age_net.caffemodel that its size about 80MB by following code

age_net_pretrained='./age_net.caffemodel'
age_net_model_file='./deploy_age.prototxt'
age_net = caffe.Classifier(age_net_model_file, age_net_pretrained,
           mean=mean,
           channel_swap=(2,1,0),
           raw_scale=255,
           image_dims=(256, 256))

And for each input image (caffe_input), I call the predict function

prediction = age_net.predict([caffe_input])

I think that due to size of network is very large. Then predict function takes long time to predict image. I think the slow time is from it.
This is my full reference code. It changed by me.

from conv_net import *

import matplotlib.pyplot as plt
import numpy as np
import cv2
import glob
import os
caffe_root = './caffe' 
import sys
sys.path.insert(0, caffe_root + 'python')
import caffe
DATA_PATH = './face/'
cnn_params = './params/gender_5x5_5_5x5_10.param'
face_params = './params/haarcascade_frontalface_alt.xml'
def format_frame(frame):
    img = frame.astype(np.float32)/255.
    img = img[...,::-1]
    return img   

if __name__ == '__main__':    
    files = glob.glob(os.path.join(DATA_PATH, '*.*'))

    # This is the configuration of the full convolutional part of the CNN
    # `d` is a list of dicts, where each dict represents a convolution-maxpooling
    # layer. 
    # Eg c1 - first layer, convolution window size
    # p1 - first layer pooling window size
    # f_in1 - first layer no. of input feature arrays
    # f_out1 - first layer no. of output feature arrays
    d = [{'c1':(5,5),
          'p1':(2,2),
          'f_in1':1, 'f_out1':5},
         {'c2':(5,5),
          'p2':(2,2),
          'f_in2':5, 'f_out2':10}]

    # This is the configuration of the mlp part of the CNN
    # first tuple has the fan_in and fan_out of the input layer
    # of the mlp and so on.
    nnet =  [(800,256),(256,2)]    
    c = ConvNet(d,nnet, (45,45))
    c.load_params(cnn_params)        
    face_cascade = cv2.CascadeClassifier(face_params)
    cap = cv2.VideoCapture(0)
    cv2.namedWindow("Image", cv2.WINDOW_NORMAL)

    plt.rcParams['figure.figsize'] = (10, 10)
    plt.rcParams['image.interpolation'] = 'nearest'
    plt.rcParams['image.cmap'] = 'gray'
    mean_filename='./mean.binaryproto'
    proto_data = open(mean_filename, "rb").read()
    a = caffe.io.caffe_pb2.BlobProto.FromString(proto_data)
    mean  = caffe.io.blobproto_to_array(a)[0]
    age_net_pretrained='./age_net.caffemodel'
    age_net_model_file='./deploy_age.prototxt'
    age_net = caffe.Classifier(age_net_model_file, age_net_pretrained,
               mean=mean,
               channel_swap=(2,1,0),
               raw_scale=255,
               image_dims=(256, 256))
    age_list=['(0, 2)','(4, 6)','(8, 12)','(15, 20)','(25, 32)','(38, 43)','(48, 53)','(60, 100)']
    while(True):

        val, image = cap.read()        
        if image is None:
            break
        image = cv2.resize(image, (320,240))
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        faces = face_cascade.detectMultiScale(gray, 1.3, 5, minSize=(30,30))

        for f in faces:
            x,y,w,h = f
            cv2.rectangle(image, (x,y), (x+w,y+h), (0,255,255))            
            face_image_rgb = image[y:y+h, x:x+w]
            caffe_input = cv2.resize(face_image_rgb, (256, 256)).astype(np.float32)
            prediction = age_net.predict([caffe_input]) 
            print 'predicted age:', age_list[prediction[0].argmax()]       
        cv2.imshow('Image', image)
        ch = 0xFF & cv2.waitKey(1)
        if ch == 27:
            break
        #break

Upvotes: 1

Views: 2076

Answers (2)

dragon7
dragon7

Reputation: 1133

For all of you who still use Caffe, I'd recommend trying OpenVINO to decrease inference time. OpenVINO optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime. OpenVINO is optimized for Intel hardware, but it should work with any CPU.

Some snippets are below.

Install OpenVINO

The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.

pip install openvino-dev[caffe]

Use Model Optimizer to convert Caffe model

The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Caffe model to IR, a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (change data_type). Run in the command line:

mo --input_model "age_net.caffemodel" --data_type FP32 --source_layout "[n,c,h,w]" --target_layout "[n,h,w,c]" --output_dir "model_ir"

Run the inference

The converted model can be loaded by the runtime and compiled for a specific device, e.g., CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what the best choice for you is, use AUTO. If you care about latency or throughput, I suggest adding a performance hint (as shown below) to use the device that fulfills your requirement.

# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/age_net.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"}) # alternatively THROUGHPUT or CUMULATIVE_THROUGHPUT

# Get input and output layers
input_layer_ir = compiled_model_ir.input(0)
output_layer_ir = compiled_model_ir.output(0)

# Resize and reshape input image
height, width = list(input_layer_ir.shape)[1:3]
input_image = cv2.resize(input_image, (width, height))[np.newaxis, ...]

# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]

Disclaimer: I work on OpenVINO.

Upvotes: 0

Shai
Shai

Reputation: 114866

Try calling age_net.predict([caffe_input]) with oversmaple=False:

prediction = age_net.predict([caffe_input], oversample=False)

The default behavior of predict is to create 10, slightly different, crops of the input image and feed them to the network to classify, by disabling this option you should get a x10 speedup.

Upvotes: 3

Related Questions