Reputation: 4439
I have a program that outputs arrays.
For example:
[[0, 1, 0], [0, 0, 0], [1, 3, 3], [2, 4, 4]]
I would like to turn these arrays into a dataframe using pandas. However, when I do the values become row values like this:
As you can see each array within the overall array becomes its own row. I would like each array within the overall array to become its own column with a column name.
Furthermore, in my use case, the number of arrays within the array is variable. There could be 4 arrays or 70 which means there could be 4 columns or 70. This is problematic when it comes to column names and I was wondering if there was anyway to auto increment column names in python.
Check out my attempt below and let me know how I can solve this.
My desired outcome is simply to make each array within the overall array into its own column instead of row and to have titles for the column that increment with each additional array/column.
Thank you so much.
Need help. Please respond!
frame = [[0, 1, 0], [0, 0, 0], [1, 3, 3], [2, 4, 4]]
numpy_data= np.array(frame)
df = pd.DataFrame(data=numpy_data, columns=["column1", "column2", "column3"])
print(frame)
print(df)
Upvotes: 8
Views: 9351
Reputation: 3594
A possible solution could be transposing
and renaming the columns after transforming the numpy
array into a dataframe
. Here is the code:
import numpy as np
import pandas as pd
frame = [[0, 1, 0], [0, 0, 0], [1, 3, 3], [2, 4, 4]]
numpy_data= np.array(frame)
#transposing later
df = pd.DataFrame(data=numpy_data).T
#creating a list of columns using list comprehension without specifying number of columns
df.columns = [f'mycol{i}' for i in range(0,len(df.T))]
print(df)
Output:
mycol0 mycol1 mycol2 mycol3
0 0 0 1 2
1 1 0 3 4
2 0 0 3 4
Same code for 11 columns:
import numpy as np
import pandas as pd
frame = [[0, 1, 0], [0, 0, 0], [1, 3, 3], [2, 4, 4], [5, 2, 2], [6,7,8], [8,9,19] , [10,2,4], [2,6,5], [10,2,5], [11,2,9]]
numpy_data= np.array(frame)
df = pd.DataFrame(data=numpy_data).T
df.columns = [f'mycol{i}' for i in range(0,len(df.T))]
print(df)
mycol0 mycol1 mycol2 mycol3 mycol4 mycol5 mycol6 mycol7 mycol8 mycol9 mycol10
0 0 0 1 2 5 6 8 10 2 10 11
1 1 0 3 4 2 7 9 2 6 2 2
2 0 0 3 4 2 8 19 4 5 5 9
Upvotes: 5
Reputation: 323236
Let us try
pd.DataFrame(dict(zip(range(len(frame)), frame)))
0 1 2 3
0 0 0 1 2
1 1 0 3 4
2 0 0 3 4
Upvotes: 1
Reputation: 18208
One way may be change it to dictionary with column name by iterating each item in the list as below:
df = pd.DataFrame({'column{}'.format(index):i for index, i in enumerate(frame)})
Alternatively, other way may be to use transpose
to what you already have. For column names you can exclude on creating dataframe and add later (not sure if you need numpy
):
df = pd.DataFrame(data=frame)
df = df.T # transposing
df.columns = ['column{}'.format(i+1) for i in df.columns] # adding column names
Result (either way):
column1 column2 column3 column4
0 0 0 1 2
1 1 0 3 4
2 0 0 3 4
Upvotes: 1
Reputation: 5036
You can transpose
the array and add_prefix
frame = [[0, 1, 0], [0, 0, 0], [1, 3, 3], [2, 4, 4]]
pd.DataFrame(np.array(frame).T).add_prefix('column')
Out:
column0 column1 column2 column3
0 0 0 1 2
1 1 0 3 4
2 0 0 3 4
Works with every number of arrays
frame = [[0, 1, 0], [0, 0, 0], [1, 3, 3], [2, 4, 4], [1,0,1], [2,0,3]]
pd.DataFrame(np.array(frame).T).add_prefix('column')
Out:
column0 column1 column2 column3 column4 column5
0 0 0 1 2 1 2
1 1 0 3 4 0 0
2 0 0 3 4 1 3
Upvotes: 3