Pissed Off Banker
Pissed Off Banker

Reputation: 115

Python multiprocessing - Build up of something causing script to hang?

I have a function that scrolls an image, and a subprocess within it that generates the next image behind the scenes to display, so that the transition between the displaying image and the new image is seamless.

The script works fine at the start. However, after a few hours of using this script, the subprocess eventually gets stuck on a write call. Killing the subprocesses doesn't help, as the next subprocess spawned also gets stuck. So it definitely means something is building up and is not clearing up in the script, but I'm not sure what.

The only way to fix this is to kill the parent process, and run it again.

   def updateMultiple(self, options):
        # print('options', options)
        for option in options:
            
            
            if option not in ['Custom GIFs', 'Custom Images', 'Custom Messages']: # these images are already saved in user uploads, dodnt need to update them 
                img = self.functions[option]()
                img.save('./display_images/'+ option+ '.ppm')

def scrollFunctionsAnimated(self, options, animation = 'down', repeat = True):
        
        # scrolls trhough all functions with animation. Updates functions and remakes images when each function not being dispplayed 
        
        self.updateMultiple([options[0]])
 
        
        kill = False 
        i = 0 # keep track of which image we are displaying
        self.double_buffer = self.matrix.CreateFrameCanvas()
        
        while True:
            
            update_process = Process(target = self.updateMultiple, args = ([options[(i+1) % len(options)]],))
            update_process.start()
            update_process.join()
            
            
            settings = json.load(open(self.JSONs[options[(i) % len(options)]]))
        
            self.set_delay(settings['speed'])
            animation = settings['animation'].lower()
            
            if options[i % len(options)]  == 'Custom Images':
                images = self.getUserImages()
                
            elif options[i % len(options)]  == 'Custom GIFs':
                images = self.getUserGIFs()
                
            elif options[i % len(options)]  == 'Custom Messages':
                images = self.getUserMessages()
                
            
            else: #these options just have a single ppm image
                image = self.openImage('./display_images/' + options[i % len(options)] +'.ppm')
                
                images = [image]
                
            
            for image in images:
                img_width, img_height = image.size
                
                offset_x = 0
                if animation == 'continuous':
                    offset_x = 128
                elif animation in ['up', 'down']:
                    offset_x = max(0, 128-img_width)
                
                offset_y = 0
                #first scroll image in from bottom
                
                
                
                frame_skip = int((1/15)/self.delay) #controls how fast gifs run 
                self.frame = 0
                
                pause_frames = int(0.5/self.delay)
                if animation == 'up':
                    offset_y = 33
                    direction = -1
                    kill = self.scrollImageY(image, direction = direction, offset_x = offset_x, offset_y = offset_y, frame_skip = frame_skip, gif = options[i % len(options)] == 'Custom GIFs')
                elif animation == 'down':
                    direction = 1
                    offset_y = -33
                    kill = self.scrollImageY(image, direction = direction, offset_x = offset_x, offset_y = offset_y, frame_skip = frame_skip, gif = options[i % len(options)] == 'Custom GIFs')
                    
                if kill: break
                offset_y = 0
                
                
                if animation in ['up', 'down']:
                    while pause_frames > 0:
                        if pause_frames%frame_skip  == 0:
                            self.incrementGIF(image)
                        
                        pause_frames -=1
                        if options[i % len(options)] != 'Custom GIFs':
                            self.double_buffer.SetImage(image, offset_x, offset_y)
                        else:
                            self.double_buffer.SetImage(image.convert('RGB'), offset_x, offset_y)
                    
                        self.double_buffer = self.matrix.SwapOnVSync(self.double_buffer)
                        
                        time.sleep(self.delay)
                        kill = self.checkKilled()
                        if kill: break
                    
                    
                    if kill: break
                    
                           
                if kill: break
                try: 
                    pause = float(settings['pause'])
                except:
                    pause = 0
                pause_frames = int(float(pause)/self.delay) 
                kill = self.scrollImage(image, offset_x = offset_x, offset_y = offset_y, frame_skip = frame_skip, gif = options[i % len(options)] == 'Custom GIFs', pause_frames = pause_frames)
                
                if kill: break
            if kill:break
            if not repeat:
                break
            update_process.join()
            update_process.terminate()
            i+=1

File descriptors of the hanging subprocess:

File descriptors of the hanging subprocess

enter image description here

I also have logs from strace below:

This is what it looks like when the script is working normally.

strace: Process 1693 attached
[pid  1693] close(4)                    = 0 <0.000028>
[pid  1693] openat(AT_FDCWD, "/dev/null", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 4 <0.000077>
[pid  1693] write(1, "options ['Forex']\n", 18) = 18 <0.000048>
[pid  1693] openat(AT_FDCWD, "csv/forex_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000064>
[pid  1693] read(6, "{\"feature\": \"Stocks\", \"speed\": \""..., 348) = 347 <0.000034>
[pid  1693] read(6, "", 1)              = 0 <0.000025>
[pid  1693] close(6)                    = 0 <0.000039>
[pid  1693] openat(AT_FDCWD, "feature_titles/forex.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000062>
[pid  1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0`\0\0\0 \4\3\0\0\0\37\356`"..., 4096) = 388 <0.000035>
[pid   412] close(5)                    = 0 <0.000064>
[pid   412] openat(AT_FDCWD, "csv/crypto_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000143>
[pid   412] read(5, "{\"feature\": \"Stocks\", \"speed\": \""..., 349) = 348 <0.000060>
[pid   412] read(5, "", 1)              = 0 <0.000040>
[pid   412] close(5)                    = 0 <0.000097>
[pid   412] openat(AT_FDCWD, "./display_images/Crypto.ppm", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000125>
[pid  1693] close(6)                    = 0 <0.000027>
[pid   412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000052>
[pid  1693] openat(AT_FDCWD, "/home/pi/logos/stocks/down-1.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000154>
[pid   412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000073>
[pid   412] read(5,  <unfinished ...>
[pid  1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\20\0\0\0\16\4\3\0\0\0\324\1\201"..., 4096) = 199 <0.000035>
[pid   412] <... read resumed> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 61440) = 50542 <0.000554>
[pid   412] read(5, "", 8192)           = 0 <0.000047>
[pid   412] close(5)                    = 0 <0.000095>
[pid  1693] close(6)                    = 0 <0.000044>
[pid  1693] openat(AT_FDCWD, "/home/pi/logos/currencies/EUR.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000074>
[pid  1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 968 <0.000040>
[pid  1693] openat(AT_FDCWD, "/home/pi/logos/currencies/USD.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 7 <0.000146>
[pid  1693] read(7, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 984 <0.000038>
[pid  1693] close(7)                    = 0 <0.000056>
[pid  1693] close(6)                    = 0 <0.000042>
[pid  1693] openat(AT_FDCWD, "/home/pi/logos/stocks/up-1.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000068>
[pid  1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\20\0\0\0\16\4\3\0\0\0\324\1\201"..., 4096) = 192 <0.000039>
[pid  1693] close(6)                    = 0 <0.000045>
[pid  1693] openat(AT_FDCWD, "/home/pi/logos/currencies/USD.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 6 <0.000112>
[pid  1693] read(6, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 984 <0.000039>
[pid  1693] openat(AT_FDCWD, "/home/pi/logos/currencies/JPY.png", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 7 <0.000151>
[pid  1693] read(7, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\26\0\0\0\26\10\3\0\0\0\363j\234"..., 4096) = 769 <0.000047>
[pid  1693] close(7)                    = 0 <0.000060>
[pid  1693] close(6)                    = 0 <0.000043>
[pid  1693] openat(AT_FDCWD, "./display_images/Forex.ppm", O_RDWR|O_CREAT|O_TRUNC|O_LARGEFILE|O_CLOEXEC, 0666) = 6 <0.000359>
[pid  1693] write(6, "P6\n507 32\n255\n", 14) = 14 <0.000104>
[pid  1693] write(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 48672) = 48672 <0.000269>
[pid  1693] close(6)                    = 0 <0.000400>
[pid  1693] +++ exited with 0 +++
[pid   412] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1693, si_uid=1, si_status=0, si_utime=5, si_stime=9} ---
[pid   412] close(4)                    = 0 <0.000064>

About 40-60 minutes later of running the Python script, here are the strace results of the hanging subprocess. PID 2220 seems to be stuck with the write(1, "options ['Forex']\n", 18 <unfinished ...> syscall? How can I check what is causing it to hang?

strace: Process 2220 attached
[pid   412] close(5)                    = 0 <0.000048>
[pid   412] openat(AT_FDCWD, "csv/crypto_settings.json", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000163>
[pid   412] read(5, "{\"feature\": \"Stocks\", \"speed\": \""..., 349) = 348 <0.000071>
[pid  2220] close(4 <unfinished ...>
[pid   412] read(5,  <unfinished ...>
[pid  2220] <... close resumed> )       = 0 <0.000083>
[pid   412] <... read resumed> "", 1)   = 0 <0.000105>
[pid  2220] openat(AT_FDCWD, "/dev/null", O_RDONLY|O_LARGEFILE|O_CLOEXEC <unfinished ...>
[pid   412] close(5 <unfinished ...>
[pid  2220] <... openat resumed> )      = 4 <0.000137>
[pid   412] <... close resumed> )       = 0 <0.000136>
[pid   412] openat(AT_FDCWD, "./display_images/Crypto.ppm", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5 <0.000099>
[pid   412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000046>
[pid  2220] write(1, "options ['Forex']\n", 18 <unfinished ...>
[pid   412] read(5, "P6\n569 32\n255\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.000079>
[pid   412] read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 61440) = 50542 <0.000322>
[pid   412] read(5, "", 8192)           = 0 <0.000067>
[pid   412] close(5)                    = 0 <0.000048>
[pid  2220] <... write resumed> )       = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <9.479316>
[pid  2220] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=412, si_uid=1} ---
[pid  2220] +++ killed by SIGTERM +++
[pid   412] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=2220, si_uid=1, si_status=SIGTERM, si_utime=0, si_stime=1} ---
[pid   412] close(4)                    = 0 <0.000218> 

Upvotes: 1

Views: 289

Answers (1)

dulngix
dulngix

Reputation: 424

It should be hang in the following line,

[pid 2220] write(1, "options ['Forex']\n", 18 <unfinished ...>

you can lookup use cat /proc/<pid>/stack to it callstack, check it a few times to see if it has changed, if it doesn't change, it means it's hanging

fd = 1 -> /dev/pts/0, the stdout is the remote terminal, it is recommended that you redirect to the file when running python, eg:

python test.py >test.log 2>&1 0>/dev/null
Example of running to the background and redirect
root@dev8:~/tmp# cat test.py
import time
import sys

while True:
    print(time.time())
    sys.stdout.flush()
    time.sleep(1)
    
root@dev8:~/tmp# python test.py >test.log 2>&1 0>/dev/null &
[1] 4203

root@dev8:~/tmp# ls -l /proc/4203/fd
total 0
l-wx------ 1 root root 64 Apr  6 19:16 0 -> /dev/null
l-wx------ 1 root root 64 Apr  6 19:16 1 -> /root/tmp/test.log
l-wx------ 1 root root 64 Apr  6 19:16 2 -> /root/tmp/test.log
root@dev8:~/tmp# head test.log
1649243765.043044
1649243766.0584888
1649243767.1079023

Upvotes: 1

Related Questions