How to multiprocess pandas dataframe using map?

Question

So I am able to multiprocess with the map function but when I add another variable it does not work.

       name                                    url
0   camera1     http://x.x.x.x:83/mjpg/video.mjpg
1   camera2      http://x.x.x.x:82/mjpg/video.mjpg
2   camera3     http://x.x.x.x:80/mjpg/video.mjpg
3   camera4  http://x.x.x.x:8001/mjpg/video.mjpg
4   camera5   http://x.x.x.x:8001/mjpg/video.mjpg
5   camera6     http://x.x.x.x:81/mjpg/video.mjpg
6   camera7     http://x.x.x.x:80/mjpg/video.mjpg
7   camera8     http://x.x.x.x:88/mjpg/video.mjpg
8   camera9     http://x.x.x.x:84/mjpg/video.mjpg
9  camera10      http://x.x.x.x:80/mjpg/video.mjpg

Here is my pandas dataframe. I have actual IPs btw.

The code below works. I have only 1 variable in the subprocess run. What the code is doing is recording the http urls all at once.

camera_df = pd.read_csv('/home/test/streams.csv',low_memory=False)
def ffmpeg_function(*arg):
        subprocess.run(["/usr/bin/ffmpeg", "-y", "-t", "10", "-i", *arg, "-f", "null", "/dev/null"], capture_output=True)

p = mp.Pool(mp.cpu_count())
camera_df['url'] = p.map(ffmpeg_function, camera_df['url'])

But when I try to add another variable to name the mp4 file that I am recording it does not work. What I am trying to do is record the http url and name the mp4 file after the name in the column next to it

camera_df = pd.read_csv('/home/test/streams.csv',low_memory=False)
def ffmpeg_function(*arg):
        subprocess.run(["/usr/bin/ffmpeg", "-y", "-t", "10", "-i", *arg, *arg], capture_output=True)

p = mp.Pool(mp.cpu_count())
video_file = '/home/test/test.mp4'
camera_df['url'] = p.map(ffmpeg_function, [camera_df['url'], [camera_df['url']])

I get the following error below

TypeError: expected str, bytes or os.PathLike object, not Series

juanpa.arrivillaga · Accepted Answer

There is absolutely no good reason to involve pandas in any of this. Just use:

import multiprocessing as mp
import csv

def ffmpeg_function(args):
    result = subprocess.run(["/usr/bin/ffmpeg", "-y", "-t", "10", "-i", *args], capture_output=True)
    return result.stdout # not sure what you actually need...

with open('/home/test/streams.csv') as f, mp.Pool(mp.cpu_count()) as pool:
    reader = csv.reader(f)
    # skip header in csv
    next(reader)
    result = pool.map(ffmpeg_function, reader)

If you insist on using pandas to do this, then just use itertuples:

with mp.Pool(mp.cpu_count()) as pool:
    df = pd.read_csv('/home/test/streams.csv')
    df['whatever'] = pool.map(
        ffmpeg_function, 
        df.itertuples(index=False, name=None)
    )

There are a lot of different ways you could have done this.

Note, in the ffmep_function you have to actually return something. Not exactly sure what you want. You may want to use return result.stdout.decode() if you want a string instead of bytes objects.

How to multiprocess pandas dataframe using map?

Answers (1)

Related Questions