dj2560
dj2560

Reputation: 75

I'am getting an memory error while iterating over pandas dataframe. How to resolve this?

I want to multiply each column with a different number and update the values for this data frame.

this is the kind of data I have which is like 20000x40 rows vs column

The code I have written is:

for j in test.columns:

    for i in r:

        for k in range(len(p)):

            test[i] = test[j].apply(lambda x:x*p[k])

            p.remove(p[k])

            break

        r.remove(i)

        break

enter image description here

And p is list of numbers that I want to multiply with.

p = [74, 46, 97, 2023, 364, 1012, 8, 242, 422, 78, 55, 90, 10, 44, 1, 3, 105, 354, 4, 26, 87, 18, 889, 9, 557, 630, 214, 1765, 760, 3344, 136, 26, 56, 10, 2, 2171, 125, 446, 174, 4, 174, 2, 80, 11, 160, 17, 72]

r is list of column names.

How to get rid of this error?

Upvotes: 0

Views: 563

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 30971

Your stacktrace points to test[i] = test[j].apply(lambda x:x*p[k]).

Note that j (at least in your code sample) has not been set.

Maybe you should put i instead?

Another solution

If you want to multiply:

  • each column from test,
  • in-place,
  • by consecutive numbers from p (it may be even a plain Python list),
  • but only as many initial elements as p has,
  • assuming that p is not longer than the number of rows in test,

you can do it with the following one-liner:

test.iloc[:len(p)] = test.iloc[:len(p)].apply(lambda col: col * p)

To test this solution, I created test DataFrame containing first 10 rows from your sample.

Then I defined p as: p = [2, 3, 4, 5, 6, 7].

The result of my code was:

    0   1    2     3    4
0   6   8    8   282   42
1  39  24   42  1434  153
2   4   0    8   336   48
3  40  20   65  1085  160
4  84  66   72  2130  366
5  91  49  119  3283  469
6   5   6   11   140   17
7   4   8   12   278   51
8   6   8   12   271   36
9  29  25   37   741  149

So, as far as first 6 rows are concerned, in each column:

  • the first element has been multiplied by 2,
  • the second by 3,
  • and so on.

Maybe this is just what you need?

Upvotes: 1

Fabrizio
Fabrizio

Reputation: 939

According to your initial statement "I want to multiply each column with a different number" I wrote this answer. It's unclear why, in your code, you have to use remove so many times and why you use so many for loops. In my case, I generated a random dataframe of 100 rows and 5 columns, and an array of 5 values for the multiplication.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 5)), columns=list('12345'))
p=np.random.randint(0,100,5)
for i in range(5):
    df.iloc[:,i]=df.iloc[:,i]*p[i]

Upvotes: 1

Related Questions