Reputation: 23
I am trying to process the data of several subjects all in one dataframe. There are >30 subjects and 14 computations per subject it is a large data set but any more than 5 blows up the memory on the scheduler node with out running any workers on the same node as the scheduler it has 128gb of memory? Any ideas how I can get around this or if im doing something wrong? code bellow.
def channel_select(chn,sub):
subject = pd.DataFrame(df.loc[df['sub'] == sub])
subject['s0'] = subject[chn]
val = []
for x in range(13):
for i in range(len(subject)):
val.append(subject['s0'].values[i-x])
name = 's' + str(x+1)
subject[name] = val
val = []
return subject
subs = df['sub'].unique()
subs = np.delete(subs, [34,33])
for s in subs:
for c in chn:
chn_del.append(delayed(channel_select)(c,subs[s]))
results = e.persist(pred)
I have the code shown to run all the subjects but anymore than 5 at a time and I run out of memory
Upvotes: 0
Views: 637
Reputation: 23
As Mary stated above, every call to channel_select created and stores the dataframe on the schedulers memory, with 30 subjects calling 14 time each and a 2gb dataframe...yeah you can do the math of how much memory that was trying grab.
Upvotes: 0
Reputation: 26
You're telling the computer to keep almost 1,000 GB of memory.
But you knew that already (:
Upvotes: 1