Reputation: 1053
I have 2 numpy arrays, I am using the top row as column headers. Each array has the same columns except for 2 columns. arr2 will have a different C column as well as an additional column
How can I combine all of these columns into a single np array?
arr1 = [ ['A', 'B', 'C1'], [1, 1, 0], [0, 1, 1] ]
arr2 = [ ['A', 'B', 'C2', 'C3'], [0, 1, 0, 1], [0, 0, 1, 0] ]
a1 = np.array(arr1)
a2 = np.array(arr2)
b = np.append(a1, a2, axis=0)
print(b)
# Desired Result
# A B C1 C2 C3
# 1 1 0 - -
# 0 1 1 - -
# 0 1 - 0 1
# 0 0 - 1 0
Upvotes: 1
Views: 91
Reputation: 8112
NumPy arrays aren't great for handling data with named columns, which might contain different types. Instead, I would use pandas
for this. For example:
import pandas as pd
arr1 = [[1, 1, 0], [0, 1, 1] ]
arr2 = [[0, 1, 0, 1], [0, 0, 1, 0] ]
df1 = pd.DataFrame(arr1, columns=['A', 'B', 'C1'])
df2 = pd.DataFrame(arr2, columns=['A', 'B', 'C2', 'C3'])
df = pd.concat([df1, df2], sort=False)
df.to_csv('mydata.csv', index=False)
This results in a 'dataframe', a spreadsheet-like data structure. Jupyter Notebooks render these as follows:
You might notice there's an extra new column; this is the "index", which you can think of as row labels. You don't need it if you don't want it in your CSV, but if you carry on doing things in the dataframe, you might want to do df = df.reset_index()
to relabel the rows in a more useful way.
If you want the dataframe back as a NumPy array, you can do df.values
and away you go. It doesn't have the column names though.
Last thing: if you really want to stay in NumPy-land, then check out structured arrays, which give you another way to name the columns, essentially, in an array. Honestly, since pandas
came along, I hardly ever see these in the wild.
Upvotes: 2