Reputation: 505
I run several python subprocesses to migrate data to S3. I noticed that my python subprocesses often drops to 0% and this condition lasts more than one minute. This significantly decreases the performance of the migration process.
Here is the pic of the sub process:
The subprocess does these things:
Spawn sub processes for each table.
for table in tables:
print "Spawn process to process {0} table".format(table)
process = multiprocessing.Process(name="Process " + table,
target=target_def,
args=(args, table))
process.daemon = True
process.start()
processes.append(process)
for process in processes:
process.join()
Query data from a database using Limit and Offset. I used PyMySQL library to query the data.
Transform returned data to another structure. construct_structure_def()
is a function that transform row into another format.
buffer_string = []
for i, row_file in enumerate(row_files):
if i == num_of_rows:
buffer_string.append( json.dumps(construct_structure_def(row_file)) )
else:
buffer_string.append( json.dumps(construct_structure_def(row_file)) + "\n" )
content = ''.join(buffer_string)
Write the transformed data into a file and compress it using gzip.
with gzip.open(file_path, 'wb') as outfile:
outfile.write(content)
return file_name
Upload the file to S3.
In order to speed up things faster, I create subprocesses for each table using multiprocesses.Process
built-in library.
I ran my script in a virtual machine. Following are the specs:
I saw on the post in here that said one of the main possibilities is because of memory I/O limitation. So I tried to run one sub process to test that theory, but no avail.
Any idea why this is happening? Let me know if you guys need more information.
Thank you in advance!
Upvotes: 1
Views: 1274
Reputation: 505
Turns out the culprit was the query I ran. The query took a long time to return the result. This made the python script idle thus zero percent usage.
Upvotes: 1
Reputation: 995
Your VM is a Windows machine, I'm more of a Linux person so I'd love it if someone will back me up here.
I think the daemon
is the problem here.
I've read about daemon preocesses and especially about TSR.
The first line in TSR says:
Regarding computers, a terminate and stay resident program (commonly referred to by the initialism TSR) is a computer program that uses a system call in DOS operating systems to return control of the computer to the operating system, as though the program has quit, but stays resident in computer memory so it can be reactivated by a hardware or software interrupt.
As I understand, making the process a daemon
(or TSR
in your case) makes it dormant until a syscall will wake it up, which I don't think is the case here (correct me if I'm wrong).
Upvotes: 0