Reputation: 143
I know parts of this question have been asked before, but I have some related questions.
I'm trying to execute
mysqldump -u uname -ppassword --add-drop-database --databases databaseName | gzip > fileName
I'm potentially dumping a very large (200GB?) db. Is that in itself a dumb thing to do? I then want to send the zipped file over the network for storage, delete the local dump, and purge a couple of tables.
Anyway, I was using subprocess like this, because there doesn't seem to be a way to execute the entire original call without subprocess considering | to be a table name.:
from subprocess import Popen, PIPE
f = open(FILENAME, 'wb')
args = ['mysqldump', '-u', 'UNAME', '-pPASSWORD', '--add-drop-database', '--databases', 'DB']
p1 = Popen(args, stdout=PIPE)
P2 = Popen('gzip', stdin=p1.stdout, stdout=f)
p2.communicate()
but then I read that communicate caches the data in memory, which wouldn't work for me. Is this true?
What I ended up doing for now is:
import gzip
subprocess.call(args, stdout=f)
f.close()
f = open(filename, 'rb')
zipFilename = filename + '.gz'
f2 = gzip.open(zipFilename, 'wb')
f2.writelines(f)
f2.close()
f.close()
of course this takes a million years, and I hate it.
My Questions: 1. Can I use my first approach on a very large db? 2. Could I possibly pipe the output of mysqldump to a socket and fire it across the network and save it when it arrives, rather than sending a zipped file?
Thanks!
Upvotes: 4
Views: 6809
Reputation: 77337
You don't need communicate(). Its only there as a convenience method if you want to read stdout/stderr to completion. But since you are chaining the commands, they are doing that for you. Just wait for them to complete.
from subprocess import Popen, PIPE
args = ['mysqldump', '-u', 'UNAME', '-pPASSWORD', '--add-drop-database', '--databases', 'DB']
with open(FILENAME, 'wb', 0) as f:
p1 = Popen(args, stdout=PIPE)
p2 = Popen('gzip', stdin=p1.stdout, stdout=f)
p1.stdout.close() # force write error (/SIGPIPE) if p2 dies
p2.wait()
p1.wait()
Upvotes: 8
Reputation: 487883
Your example code using two subprocess.Popen
calls is correct (albeit slightly-improve-able), and this:
... I read that communicate caches the data in memory
is also correct—it reads into memory all of the standard-output and standard-error-output that the "communicating command" produces on a subprocess.PIPE
—but not a problem here, because you have this:
p1 = Popen(args, stdout=PIPE)
P2 = Popen('gzip', stdin=p1.stdout, stdout=f)
p2.communicate()
You're calling communicate()
on p2
, whose stdout output is sent to f
(an opened file), and whose stderr output—which is probably empty anyway (no errors occur)—is not being sent to a PIPE
. Thus, p2.communicate()
would at worst have to read and buffer-up a grand total of zero bytes of stdout plus zero bytes of stderr. It's actually a bit more clever, noticing that there is no PIPE
, so it returns the tuple (None, None)
.
If you were to call p1.communicate()
, that would be more of an issue (although in this case you'd then be fighting with p2
, the gzip process, for the output from p1
, which would be even worse). But you are not; p1
's output flows to p2
, and p2
's output flows to a file.
Since none of p2
's output is sent to a PIPE
, there is no need to call p2.communicate()
here: you can simply call p2.wait()
. That makes it clearer that there's no data flowing back from p2
(which I would say is a minor improvement to the code, although if you decide you want to capture p2
's stderr after all, you'd have to change that back).
Edit to add: as in glglgl's answer, it's important to close p1
's pipe to p2
after creating p2, otherwise p2
will wait for your Python process to send data to p2
, too.
Upvotes: 2
Reputation: 91017
You are quite close to where you want:
from subprocess import Popen, PIPE
f = open(FILENAME, 'wb')
args = ['mysqldump', '-u', 'UNAME', '-pPASSWORD', '--add-drop-database', '--databases', 'DB']
p1 = Popen(args, stdout=PIPE)
Till here it is right.
p2 = Popen('gzip', stdin=p1.stdout, stdout=PIPE)
This one takes p1
's output and processes it. Afterwards we can (and should) immediately p1.stdout.close()
.
Now we have a p2.stdout
which can be read from and, without using a temporary file, send it via the network:
s = socket.create_connection(('remote_pc', port))
while True:
r = p2.stdout.read(65536)
if not r: break
s.send(r)
Upvotes: 3
Reputation: 4725
Yup the data is buffered in memory:
"Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited." - subprocess docs
Unfortunately at the moment there is no way to asynchronously use Popen: PEP3145
Rather than doing this all in python you can manually do
os.system("mysqldump -u uname -ppassword --add-drop-database --databases databaseName | gzip > fileName
")
with the appropriate string replacements using string.format
of course; otherwise you're putting an unnecessary amount of stress on your computer, especially trying to communicate 200gb via a pipe ...
Can you elaborate on what you are trying to do? Right now it sounds like you're both dumping and zipping on the same computer.
Yes you can stream a file across the network .. I don't know if you want to directly stream the output of mysql directly though - you might want to look at your network capabilities before considering that
bash:
#!/bin/bash
mysqldump -u uname -ppassword --add-drop-database --databases databaseName | gzip > fileName
#transfer fileName to other computer
^ you can also put this in a crontab and have it run at intervals :)
Upvotes: 2