How to: (1) making a copy of a numpy array, (2) modifing this copy, and (3) returning the modified copy

Question

My goal is to write a function that (1) makes a copy of a numpy array, (2) modifies this copy, and (3) returns the modified copy. However, this doesn't work as I thought it would...

To show a simple example, let's assume I have a simple function for z-score normalization:

def standardizing1(array, columns, ddof=0):

    ary_new = array.copy()
    if len(ary_new.shape) == 1:
        ary_new = ary_new[:, np.newaxis]

    return (ary_new[:, columns] - ary_new[:, columns].mean(axis=0)) /\
                       ary_new[:, columns].std(axis=0, ddof=ddof)

And the results are what I expect:

>>> ary = np.array([[1, 10], [2, 9], [3, 8], [4, 7], [5, 6], [6, 5]])
>>> standardizing1(ary, [0, 1])

array([[-1.46385011,  1.46385011],
   [-0.87831007,  0.87831007],
   [-0.29277002,  0.29277002],
   [ 0.29277002, -0.29277002],
   [ 0.87831007, -0.87831007],
   [ 1.46385011, -1.46385011]])

However, let's say I want to return a modified version of the copy. I am wondering why it doesn't work. For example,

def standardizing2(array, columns, ddof=0):

    ary_new = array.copy()
    if len(ary_new.shape) == 1:
        ary_new = ary_new[:, np.newaxis]

    ary_new[:, columns] = (ary_new[:, columns] - ary_new[:, columns].mean(axis=0)) /\
                       ary_new[:, columns].std(axis=0, ddof=ddof)

    # some more processing steps with ary_new
    return ary_new

>>> ary = np.array([[1, 10], [2, 9], [3, 8], [4, 7], [5, 6], [6, 5]])
>>> standardizing2(ary, [0, 1])

array([[-1,  1],
   [ 0,  0],
   [ 0,  0],
   [ 0,  0],
   [ 0,  0],
   [ 1, -1]])

But if I assign it to a new array, without "slicing", it works again

def standardizing3(array, columns, ddof=0):

    ary_new = array.copy()
    if len(ary_new.shape) == 1:
        ary_new = ary_new[:, np.newaxis]

    some_ary = (ary_new[:, columns] - ary_new[:, columns].mean(axis=0)) /\
                       ary_new[:, columns].std(axis=0, ddof=ddof)

    return some_ary

>>>> ary = np.array([[1, 10], [2, 9], [3, 8], [4, 7], [5, 6], [6, 5]])
>>> standardizing3(ary, [0, 1])

array([[-1.46385011,  1.46385011],
   [-0.87831007,  0.87831007],
   [-0.29277002,  0.29277002],
   [ 0.29277002, -0.29277002],
   [ 0.87831007, -0.87831007],
   [ 1.46385011, -1.46385011]])

user2357112 · Accepted Answer

When you do

ary = np.array([[1, 10], [2, 9], [3, 8], [4, 7], [5, 6], [6, 5]])

You create an array of integer dtype. That means that

ary_new = array.copy()

is also an array of integer dtype. It cannot hold floating-point numbers; when you try to put floats into it:

ary_new[:, columns] = ...

they are automatically cast to integers.

If you want an array of floats, you would have to specify that when you create the array:

ary_new = array.astype(float)

How to: (1) making a copy of a numpy array, (2) modifing this copy, and (3) returning the modified copy

Answers (1)

Related Questions