Reputation: 49
Topic: Memory overflow caused by small amount of data
Use-case: I have instances of objects that do some work on data. These instances should be passed to the workers along with the data. I'm testing it right now on a local machine (EC2 c6i.12xlarge ubuntu 18.04).
Problem: The instances of my objects cause a memory overflow despite that the data and instances are only couple MB in size. I found that when I use 3rd party libraries like nltk inside the instances, the memory grows quickly with the amount of cpus used. When I don't use those 3rd party libraries, everything is working as it should.
Expected behavior: The memory usage is not increasing linearly with the amount of cpus
Minimal Example: Below a minimal example, with the output below. When I pass only the data (10 MB in the example) without the object instance to the workers the memory overhead is negligible small. When I pass the instance only without data to a worker, the Memory overhead is almost scaling linear (1 cpu: 6 MB, 2 cpus:11 MB, 10 cpus: 60 MB) - so it seems some package information is passed to every cpu along with the object instance, which is fine. However, when I pass Data (10 MB) and object instances, the data is also copied multiple times ( 1 cpu: 20 MB, 10 cpu: 180 MB). When I want to run on 30-50 cpus on a single machine with data of couple GB, this causes a memory overflow.
Questions: How can I give instances of objects that depend on 3rd party libraries without the above behavior? Is there a best practice to handle small, global variables that is different than the approach to putting them in the object storage?
import nltk
import psutil
import ray
class DummyObject():
def do_something(self):
print(nltk.__version__)
@ray.remote
def dummy_fun(*args):
pass
def create_data(target_size_mb=10):
"""
Create some random data
:param target_size_mb:
:return:
"""
# Create a list of random strings
data_entries = 80000 * target_size_mb # Number of rows
size_per_entry = 100 # Byte size per entry
length_string = size_per_entry - 49 # Length of a string that satisfies the byte size
payload = ['a' * length_string for i in range(data_entries)] # Create payload as specified
return payload
def run_problem(payload=None, config=None):
num_cpu = 1
tasks = num_cpu
# Init ray
ray.init(num_cpus=num_cpu)
# Put it in the object storage
payload_id = ray.put(payload)
config_id = ray.put(config)
# Track memory in a naive way
start_memory = psutil.virtual_memory()[3]
# Create jobs
result_id = [dummy_fun.remote(config_id, payload_id) for i in range(tasks)]
# Run jobs
result = ray.get(result_id)
end_memory = psutil.virtual_memory()[3]
print('Memory usage {} MB'.format((end_memory - start_memory) / 8 / 1000 / 1000))
ray.shutdown()
print("Payload: None \t config: Dummy Object")
run_problem(payload=None, config=DummyObject)
print("-" * 100)
print("Payload: 10 MB \t config: None")
run_problem(payload=create_data(target_size_mb=10), config=None)
print("-" * 100)
print("Payload: 10 MB \t config: Dummy Object")
run_problem(payload=create_data(target_size_mb=10), config=DummyObject)
print("-" * 100)
Output:
Payload: None config: Dummy Object
Memory usage 5.612544 MB
----------------------------------------------------------------------------------------------------
Payload: 10 MB config: None
Memory usage 0.23705600000000002 MB
----------------------------------------------------------------------------------------------------
Payload: 10 MB config: Dummy Object
Memory usage 20.628991999999997 MB
----------------------------------------------------------------------------------------------------
Process finished with exit code 0
EDIT Singleton
When there is a Singleton that puts an instance of the DummyObject in a variable, the memory usage is as usual - I tried this only on a single machine.
import nltk
import psutil
import ray
def singleton(cls):
instances = {}
def getinstance(**kwargs):
if cls not in instances:
instances[cls] = cls(**kwargs)
return instances[cls]
return getinstance
@singleton
class SingletonStorage:
def __init__(self, storage):
print('ping')
self.storage = storage
class DummyObject():
def do_something(self):
print(nltk.__version__)
@ray.remote
def dummy_fun(*args):
SingletonStorage(storage=None).storage.do_something()
pass
def create_data(target_size_mb=10):
"""
Create some random data
:param target_size_mb:
:return:
"""
# Create a list of random strings
data_entries = 80000 * target_size_mb # Number of rows
size_per_entry = 100 # Byte size per entry
length_string = size_per_entry - 49 # Length of a string that satisfies the byte size
payload = ['a' * length_string for i in range(data_entries)] # Create payload as specified
return payload
def run_problem(payload=None, config=None):
num_cpu = 1
tasks = num_cpu
SingletonStorage(storage=DummyObject())
# Init ray
ray.init(num_cpus=num_cpu)
# Put it in the object storage
payload_id = ray.put(payload)
config_id = ray.put(config)
# Track memory in a naive way
start_memory = psutil.virtual_memory()[3]
# Create jobs
result_id = [dummy_fun.remote(config_id, payload_id) for i in range(tasks)]
# Run jobs
result = ray.get(result_id)
end_memory = psutil.virtual_memory()[3]
print('Memory usage {} MB'.format((end_memory - start_memory) / 8 / 1000 / 1000))
ray.shutdown()
print("Payload: None \t config: Dummy Object")
run_problem(payload=None, config=DummyObject())
print("-" * 100)
print("Payload: 100 MB \t config: None")
run_problem(payload=create_data(target_size_mb=100), config=None)
print("-" * 100)
print("Payload: 100 MB \t config: Dummy Object")
run_problem(payload=create_data(target_size_mb=100), config=DummyObject())
print("-" * 100)
Upvotes: 1
Views: 691
Reputation: 571
I reproduced what you're describing and found that the memory consumption per task is constant when both (config_obj, payload) are passed to the task. However I think you've found an issue in Ray -- each task takes more memory when (config_obj, payload) are passed, and the exact amount of additional memory is nearly equal to the size of the payload. See my numbers below for more data here. I've asked the Ray Core team about this, see this Discuss thread.
To workaround this behavior, I suggest you simplify the config object by either 1) removing references to external packages or 2) making it a Ray actor and have tasks invoke remote methods on it. (1) is probably easier and simpler. I'm not sure what requirements you have, but following your example, I'd define DummyObject
this way:
class DummyObject():
def __init__(nltk_version):
self.nltk_version = nltk_version
def do_something(self):
print(self.nltk_version)
import nltk
# In the driver process, or in a Ray task/actor as long as you don't
# need to scale it.
config = DummyObject(nltk.__version__)
config_id = ray.put(config)
payload_id = ... # omitted
results = [dummy_fun.remote(config_id, payload_id) for i in range(16)]
ray.get(results)
Tests ran on 16CPU machine (AWS c5.4xl, Ray 1.13).
+--------------+-----------------+-----------+---------+------------------+
| with_payload | with_config_obj | num_tasks | used_mb | used_mb_per_task |
+--------------+-----------------+-----------+---------+------------------+
| True | True | 1 | 28.47 | 28.47 |
| True | True | 8 | 209.51 | 26.19 |
| True | True | 16 | 419.36 | 26.21 |
| False | True | 1 | 18.27 | 18.27 |
| False | True | 8 | 130.23 | 16.28 |
| False | True | 16 | 256.55 | 16.03 |
| True | False | 1 | 3.01 | 3.01 |
| True | False | 8 | 14.65 | 1.83 |
| True | False | 16 | 29.07 | 1.82 |
| False | False | 1 | 0.52 | 0.52 |
| False | False | 8 | 0.52 | 0.07 |
| False | False | 16 | 2.82 | 0.18 |
+--------------+-----------------+-----------+---------+------------------+
+--------------+-----------------+-----------+---------+------------------+
| with_payload | with_config_obj | num_tasks | used_mb | used_mb_per_task |
+--------------+-----------------+-----------+---------+------------------+
| True | True | 1 | 117.09 | 117.09 |
| True | True | 8 | 933.07 | 116.63 |
| True | True | 16 | 1862.18 | 116.39 |
| False | True | 1 | 16.9 | 16.9 |
| False | True | 8 | 129.67 | 16.21 |
| False | True | 16 | 255.3 | 15.96 |
| True | False | 1 | 2.48 | 2.48 |
| True | False | 8 | 14.35 | 1.79 |
| True | False | 16 | 28.56 | 1.78 |
| False | False | 1 | 0.65 | 0.65 |
| False | False | 8 | 1.6 | 0.2 |
| False | False | 16 | 0.87 | 0.05 |
+--------------+-----------------+-----------+---------+------------------+
nltk
reference removed+--------------+-----------------+-----------+---------+------------------+
| with_payload | with_config_obj | num_tasks | used_mb | used_mb_per_task |
+--------------+-----------------+-----------+---------+------------------+
| True | True | 1 | 2.02 | 2.02 |
| True | True | 8 | 15.64 | 1.95 |
| True | True | 16 | 28.29 | 1.77 |
| False | True | 1 | 0.31 | 0.31 |
| False | True | 8 | 4.46 | 0.56 |
| False | True | 16 | 7.57 | 0.47 |
| True | False | 1 | 2.24 | 2.24 |
| True | False | 8 | 14.12 | 1.77 |
| True | False | 16 | 28.14 | 1.76 |
| False | False | 1 | 0.52 | 0.52 |
| False | False | 8 | 1.08 | 0.13 |
| False | False | 16 | 2.82 | 0.18 |
+--------------+-----------------+-----------+---------+------------------+
Upvotes: 1