Handling missing data with numpy to concentrate different shape arrays

Question

So I decided to get a little more into numpy but my original data comes from a dataframe, let's say for now this is the dataframe:

df = pd.DataFrame({
'col1': [101, 200, 306, 402, 500, 600],
'col2': [100, 200, 300, 400, 500, 600]})

I want to perform some basic column based calculations and save then in the same order inside of a numpy array, so I turn it into a numpy 2d array like this:

arr = np.array(df['col1'] - df['col2']).reshape((-1, 1))
    # out
[[1]
 [0]
 [6]
 [2]
 [0]
 [0]]

But then let's say my dataframe updates and the values of col1 become col2 and so the new values are added to col1 and zeroes are added to col2 if the value didn't exist before:

df = pd.DataFrame({
'col1': [103, 220, 316, 406, 501, 606, 348],
'col2': [101, 200, 306, 402, 500, 600, 0]})

So now instead of 6 values I have 7 which is where the complications start, since I want to calculate this difference as well and append it in the order it is to the array, so I tried to do this:

arr1 = np.array(df['col1'] - df['col2']).reshape((-1, 1))
arr = np.append(arr, np.zeros((len(arr1 - arr), arr.shape[0])), axis=1)

In order to fill the missing values and allow for concentration of both arrays, but it throws a : ValueError: operands could not be broadcast together with shapes (7,1) (6,1)

I appreciate any help!

Full code and expected output

df = pd.DataFrame({
'col1': [101, 200, 306, 402, 500, 600],
'col2': [100, 200, 300, 400, 500, 600]})

arr = np.array(df['col1'] - df['col2']).reshape((-1, 1))

df = pd.DataFrame({
'col1': [103, 220, 316, 406, 501, 606, 348],
'col2': [101, 200, 306, 402, 500, 600, 0]})

arr1 = np.array(df['col1'] - df['col2']).reshape((-1, 1))

arr = np.append(arr, np.zeros((len(arr1 - arr), arr.shape[1])), axis=0)

arr = np.concatenate((arr, arr1), axis=1)

##EXPECTED##

[[1   2]
 [0  20]
 [6  10]
 [2   4]
 [0   1]
 [0   6]
 [0 348]]

Akshay Sehgal · Accepted Answer

Try this instead of the np.append -

Create np.zeros((difference in shape[0], arr.shape[1]))
np.vstack the arr and the zeros
Concatenate arr, arr1

arr = np.vstack([arr, np.zeros((arr1.shape[0] - arr.shape[0], arr.shape[1]))]) #<--------

arr = np.concatenate((arr, arr1), axis=1)

print(arr)

# [[1   2]
#  [0  20]
#  [6  10]
#  [2   4]
#  [0   1]
#  [0   6]
#  [0 348]]

Handling missing data with numpy to concentrate different shape arrays

Answers (1)

Related Questions