Python Numpy Array Append with Blank Columns

Question

I have 2 numpy arrays, I am using the top row as column headers. Each array has the same columns except for 2 columns. arr2 will have a different C column as well as an additional column

How can I combine all of these columns into a single np array?

arr1 = [ ['A', 'B', 'C1'], [1, 1, 0], [0, 1, 1] ]
arr2 = [ ['A', 'B', 'C2', 'C3'], [0, 1, 0, 1], [0, 0, 1, 0] ]
a1 = np.array(arr1)
a2 = np.array(arr2)

b = np.append(a1, a2, axis=0)
print(b)

# Desired Result
# A B C1 C2 C3
# 1 1  0  -  -
# 0 1  1  -  -
# 0 1  -  0  1
# 0 0  -  1  0

Matt Hall · Accepted Answer

NumPy arrays aren't great for handling data with named columns, which might contain different types. Instead, I would use pandas for this. For example:

import pandas as pd

arr1 = [[1, 1, 0], [0, 1, 1] ]
arr2 = [[0, 1, 0, 1], [0, 0, 1, 0] ]

df1 = pd.DataFrame(arr1, columns=['A', 'B', 'C1'])
df2 = pd.DataFrame(arr2, columns=['A', 'B', 'C2', 'C3'])

df = pd.concat([df1, df2], sort=False)

df.to_csv('mydata.csv', index=False)

This results in a 'dataframe', a spreadsheet-like data structure. Jupyter Notebooks render these as follows:

You might notice there's an extra new column; this is the "index", which you can think of as row labels. You don't need it if you don't want it in your CSV, but if you carry on doing things in the dataframe, you might want to do df = df.reset_index() to relabel the rows in a more useful way.

If you want the dataframe back as a NumPy array, you can do df.values and away you go. It doesn't have the column names though.

Last thing: if you really want to stay in NumPy-land, then check out structured arrays, which give you another way to name the columns, essentially, in an array. Honestly, since pandas came along, I hardly ever see these in the wild.

Python Numpy Array Append with Blank Columns

Answers (1)

Related Questions