Daan Seuntjens
Daan Seuntjens

Reputation: 960

Condor OutOfMemoryError

I'm running some code to test if I can run the OpenCV library on a Condor system. It works fine when running python testje.py in the console. When submitting it on the condor system I receive the following error message:

Traceback (most recent call last):
File "/usr/data/condor/execute/dir_323648/condor_exec.exe", line 24, in <module>
    kp, des = sift.detectAndCompute(img, None)
cv2.error: OpenCV(4.5.3) /tmp/pip-req-build-3umofm98/opencv/modules/core/src/alloc.cpp:73: error: (-4:Insufficient memory) Failed to allocate 48385936 bytes in function 'OutOfMemoryError'

So it failed to allocate 48MB of RAM?

When looking at the .log file:

Partitionable Resources :    Usage  Request Allocated 
   Cpus                 :                 1         1 
   Disk (KB)            :   192786   768000   1557164 
   Gpus (Average)       :                 0         0 
   Memory (MB)          :        0      500       512 

I requested 500MB of RAM, there should be plenty. Why does it crash?


All code (to be complete)

testje.py:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy as np
import pickle
import sys
import cv2

print('Starting')
# setup cv2
sift = cv2.SIFT_create()
img = cv2.imread("0.jpg", cv2.IMREAD_GRAYSCALE)
print(img)

print('start calc')
# calc cv2
kp, des = sift.detectAndCompute(img, None)
# calc np
norms = np.linalg.norm(des, axis=1)

# calc normal? python
index = []
for p in kp:
    temp = (p.pt, p.size, p.angle, p.response, p.octave, p.class_id)
    index.append(temp)


#store using pickle
with open('./random_dat.pickle', 'wb') as handle:
    pickle.dump((123456, index, des, norms), handle)
    
print("finished")

Note:

OpenCV is imported by using pip install ---target=my\python\dir\site-packages\ opencv-python so that condor can transfer the library and import it as a local library.

Condor input:

#Normal execution
Universe = vanilla

#I need just one CPU (which is the default)
RequestCpus    = 1
#No GPU
RequestGPUs    = 0
#I need disk spqce KB
RequestDisk = 750MB
#I need 2 GBytes of RAM (resident memory)
RequestMemory  = 500MB
#It will not run longer than 1 day
+RequestWalltime = 150

#Transfer input files in cur_dir
transfer_input_files = 0.jpg, site-packages/

#retrieve data
should_transfer_files = YES
when_to_transfer_output = ON_EXIT

#I'm a nice person, I think...
NiceUser = true
#Mail me only if something is wrong
Notification = Always

# The job will 'cd' to this directory before starting, be sure you can _write_ here.
initialdir = /users/students/r0xxxxxx/Documents/testing_condor/
# This is the executable or script I want to run
executable = /users/students/r0xxxxxx/Documents/testing_condor/testje.py

#Output of condors handling of the jobs, will be in 'initialdir'
Log          = condor_bin.log
#Standard output of the 'executable', in 'initialdir'
Output       = condor_bin.out
#Standard error of the 'executable', in 'initialdir'
Error        = condor_bin.err

# Start just 1 instance of the job
Queue 1

Full condor log:

...
000 (3xx.xxx.xxx) 2021-07-19 16:20:57 Job submitted from host: <10.xx.xx.xxx:xxxx?addrs=10.xx.xx.xxx-xxxx&alias=name.xxxx.xxxxxxxx.be&noUDP&sock=schedd_xxxx_xxxx>
...
040 (3xx.xxx.xxx) 2021-07-19 16:21:15 Started transferring input files
    Transferring to host: <10.87.24.13:9618?addrs=10.xx.xx.xx-xxxx&alias=other.xxxx.xxxxxx.be&noUDP&sock=slotx_x_xxxxxx_xxxx_xxxx>
...
040 (3xx.xxx.xxx) 2021-07-19 16:21:19 Finished transferring input files
...
001 (3xx.xxx.xxx) 2021-07-19 16:21:20 Job executing on host: <10.xx.xx.xxx:xxxx?addrs=10.xx.xx.xx-xxxx&alias=other.xxxx.xxxxxxxx.be&noUDP&sock=startd_xxxx_xxxx>
...
006 (3xx.xxx.xxx) 2021-07-19 16:21:22 Image size of job updated: 1
    0  -  MemoryUsage of job (MB)
    0  -  ResidentSetSize of job (KB)
...
040 (3xx.xxx.xxx) 2021-07-19 16:21:22 Started transferring output files
...
040 (3xx.xxx.xxx) 2021-07-19 16:21:22 Finished transferring output files
...
005 (3xx.xxx.xxx) 2021-07-19 16:21:22 Job terminated.
    (1) Normal termination (return value 1)
        Usr 0 00:00:01, Sys 0 00:00:00  -  Run Remote Usage
        Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
        Usr 0 00:00:01, Sys 0 00:00:00  -  Total Remote Usage
        Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
    566  -  Run Bytes Sent By Job
    197400704  -  Run Bytes Received By Job
    566  -  Total Bytes Sent By Job
    197400704  -  Total Bytes Received By Job
    Partitionable Resources :    Usage  Request Allocated 
       Cpus                 :                 1         1 
       Disk (KB)            :   192786   768000   1557164 
       Gpus (Average)       :                 0         0 
       Memory (MB)          :        0      500       512 

    Job terminated of its own accord at 2021-07-19T14:21:22Z.
...

python prints:

Starting
[[ 21  83  40 ...   2  36  57]
 [ 42  51  27 ...  53  44  28]
 [ 60  31 127 ...  46  28  20]
 ...
 [103  80  22 ...  26  58 105]
 [ 58  63  47 ...  44  49  66]
 [ 48  49  56 ...  64  51  57]]
start calc

Upvotes: 0

Views: 150

Answers (0)

Related Questions