Reputation: 433
I have some dataframes in a dictionary, and I want to merge all these dataframes using a common column "date". To do so, I used the following code :
n = len(dictionary)
something = dictionary[dictionnary_keys[0]]
for i in range(1,n):
something = something.merge(dictionary[dictionnary_keys[i], on="date")
print(something.shape)
Note that every dictionary's value is a pandas dataframe and its shape is (500,10).
When I run that code, I get a memory error because both number of rows and columns increase ... However, only the number of columns has to increase. I don't understand why I get this result.
Can someone explain me how to deal with such a situation ?
Thank you for your help. If you want more information, just let me know :)
Upvotes: 2
Views: 1408
Reputation: 7065
You most likely have duplicated date
values.
Here is a quick example:
# Generate dict of DatFrame with duplicated 'a'
d = dict()
for i in range(4):
d[i] = pd.DataFrame({'a': list('ABBCD'), 'b':np.random.randint(0, 10, 5), 'c': i})
n = len(d)
s = d[0]
for i in range(1,n):
s = s.merge(d[i], on="a")
print(s.shape)
(7, 5)
(11, 7)
(19, 9)
Re-run with no duplicates:
d = dict()
for i in range(4):
d[i] = pd.DataFrame({'a': list('ABCDE'), 'b':np.random.randint(0, 10, 5), 'c': i})
n = len(d)
s = d[0]
for i in range(1,n):
s = s.merge(d[i], on="a")
print(s.shape)
(5, 5)
(5, 7)
(5, 9)
Merging in this way might lead to complications with how your series are named:
a b_x c_x b_y c_y b_x c_x b_y c_y
0 A 4 0 5 1 0 2 9 3
1 B 5 0 8 1 3 2 0 3
2 C 6 0 0 1 5 2 8 3
3 D 2 0 0 1 8 2 8 3
4 E 8 0 2 1 7 2 9 3
s['b_x']
b_x b_x
0 4 0
1 5 3
2 6 5
3 2 8
4 8 7
Upvotes: 2