Zobal
Zobal

Reputation: 143

python subprocess and mysqldump

I know parts of this question have been asked before, but I have some related questions.

I'm trying to execute

mysqldump -u uname -ppassword --add-drop-database --databases databaseName | gzip > fileName

I'm potentially dumping a very large (200GB?) db. Is that in itself a dumb thing to do? I then want to send the zipped file over the network for storage, delete the local dump, and purge a couple of tables.

Anyway, I was using subprocess like this, because there doesn't seem to be a way to execute the entire original call without subprocess considering | to be a table name.:

from subprocess import Popen, PIPE

f = open(FILENAME, 'wb')
args = ['mysqldump', '-u', 'UNAME', '-pPASSWORD', '--add-drop-database', '--databases', 'DB']

p1 = Popen(args, stdout=PIPE)
P2 = Popen('gzip', stdin=p1.stdout, stdout=f)
p2.communicate()

but then I read that communicate caches the data in memory, which wouldn't work for me. Is this true?

What I ended up doing for now is:

import gzip
subprocess.call(args, stdout=f)
f.close()

f = open(filename, 'rb')
zipFilename = filename + '.gz'
f2 = gzip.open(zipFilename, 'wb')
f2.writelines(f)
f2.close()
f.close()

of course this takes a million years, and I hate it.

My Questions: 1. Can I use my first approach on a very large db? 2. Could I possibly pipe the output of mysqldump to a socket and fire it across the network and save it when it arrives, rather than sending a zipped file?

Thanks!

Upvotes: 4

Views: 6809

Answers (4)

tdelaney
tdelaney

Reputation: 77337

You don't need communicate(). Its only there as a convenience method if you want to read stdout/stderr to completion. But since you are chaining the commands, they are doing that for you. Just wait for them to complete.

from subprocess import Popen, PIPE

args = ['mysqldump', '-u', 'UNAME', '-pPASSWORD', '--add-drop-database', '--databases', 'DB']

with open(FILENAME, 'wb', 0) as f:
    p1 = Popen(args, stdout=PIPE)
    p2 = Popen('gzip', stdin=p1.stdout, stdout=f)
p1.stdout.close() # force write error (/SIGPIPE) if p2 dies
p2.wait()
p1.wait()

Upvotes: 8

torek
torek

Reputation: 487883

Your example code using two subprocess.Popen calls is correct (albeit slightly-improve-able), and this:

... I read that communicate caches the data in memory

is also correct—it reads into memory all of the standard-output and standard-error-output that the "communicating command" produces on a subprocess.PIPE—but not a problem here, because you have this:

p1 = Popen(args, stdout=PIPE)
P2 = Popen('gzip', stdin=p1.stdout, stdout=f)
p2.communicate()

You're calling communicate() on p2, whose stdout output is sent to f (an opened file), and whose stderr output—which is probably empty anyway (no errors occur)—is not being sent to a PIPE. Thus, p2.communicate() would at worst have to read and buffer-up a grand total of zero bytes of stdout plus zero bytes of stderr. It's actually a bit more clever, noticing that there is no PIPE, so it returns the tuple (None, None).

If you were to call p1.communicate(), that would be more of an issue (although in this case you'd then be fighting with p2, the gzip process, for the output from p1, which would be even worse). But you are not; p1's output flows to p2, and p2's output flows to a file.

Since none of p2's output is sent to a PIPE, there is no need to call p2.communicate() here: you can simply call p2.wait(). That makes it clearer that there's no data flowing back from p2 (which I would say is a minor improvement to the code, although if you decide you want to capture p2's stderr after all, you'd have to change that back).


Edit to add: as in glglgl's answer, it's important to close p1's pipe to p2 after creating p2, otherwise p2 will wait for your Python process to send data to p2, too.

Upvotes: 2

glglgl
glglgl

Reputation: 91017

You are quite close to where you want:

from subprocess import Popen, PIPE

f = open(FILENAME, 'wb')
args = ['mysqldump', '-u', 'UNAME', '-pPASSWORD', '--add-drop-database', '--databases', 'DB']

p1 = Popen(args, stdout=PIPE)

Till here it is right.

p2 = Popen('gzip', stdin=p1.stdout, stdout=PIPE)

This one takes p1's output and processes it. Afterwards we can (and should) immediately p1.stdout.close().

Now we have a p2.stdout which can be read from and, without using a temporary file, send it via the network:

s = socket.create_connection(('remote_pc', port))
while True:
    r = p2.stdout.read(65536)
    if not r: break
    s.send(r)

Upvotes: 3

Eiyrioü von Kauyf
Eiyrioü von Kauyf

Reputation: 4725

Yup the data is buffered in memory:

"Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited." - subprocess docs

Unfortunately at the moment there is no way to asynchronously use Popen: PEP3145

Rather than doing this all in python you can manually do

os.system("mysqldump -u uname -ppassword --add-drop-database --databases databaseName | gzip > fileName

")

with the appropriate string replacements using string.format of course; otherwise you're putting an unnecessary amount of stress on your computer, especially trying to communicate 200gb via a pipe ...

Can you elaborate on what you are trying to do? Right now it sounds like you're both dumping and zipping on the same computer.


Yes you can stream a file across the network .. I don't know if you want to directly stream the output of mysql directly though - you might want to look at your network capabilities before considering that


bash:

#!/bin/bash
mysqldump -u uname -ppassword --add-drop-database --databases databaseName | gzip > fileName
 #transfer fileName to other computer

^ you can also put this in a crontab and have it run at intervals :)

Upvotes: 2

Related Questions