Reputation: 11
I have a pandas DataFrame composed of a list of objects and then 4 lists of 12 values for each object. It has the general form:
I would like to transpose the dataframe and have hierarchical indices ('Name', 'names of 4 lists'). The general form of this would look like
I have tried the following, with rows_list being my source data:
import pandas as pd
test_table = pd.DataFrame(rows_list, columns=("name", "frac_0", "frac_1","frac_2", "frac_3"))
name = pd.Series(test_table['name'])
del test_table['name']
test_table = test_table.T
test_table = test_table.sort_index([subjectname])
This gives me a TypeError that states
"unhashable type: 'list'".
A simple test_table.T
operation also doesn't give me what I need, as I need columns to correspond to items in the (List1, List2, etc) lists, and the rows to be indexed by name and then List1, List2. I've gone back and forth with adding new columns, or trying to build a brand new DataFrame from multiple series, but nothing seems to work.
Thanks for your help!
Upvotes: 1
Views: 567
Reputation: 496
Mock df:
df = pd.DataFrame(columns=['Name', 'List 1', 'List 2'], data=[['A', [1,2,3,4], [1,2,3,4]], ['B', [1,2,3,4], [1,2,3,4]], ['C', [1,2,3,4], [1,2,3,4]]])
Get 'Name' out of the way:
df.set_index('Name', inplace=True)
List 1 List 2
Name
A [1, 2, 3, 4] [1, 2, 3, 4]
B [1, 2, 3, 4] [1, 2, 3, 4]
C [1, 2, 3, 4] [1, 2, 3, 4]
n_name = len(df.index)
n_list = len(df.columns)
n_item = len(df.iat[0, 0])
The df values now has a shape of (3,2). We need to reshape to, in this mock df, a (6,) array to remove one dimension. We then make it a list.
vals = list(df.values.reshape((n_list * n_name),))
[[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]]
Now we get the values for the index levels. Since 'Name' is the first level, we want that level repeated by the number of unique values in the next level, so we use repeat. The level with lists, we want to maintain order, so we use tile. Then your column names are added:
idx_name = np.repeat(df.index.values, n_list)
idx_list = np.tile(df.columns.values, n_name)
columns = ['Col' + str(n) for n in list(range(1, n_item+1))]
Create final df:
df = pd.DataFrame(data=vals, index=[idx_name, idx_list], columns=columns)
Col1 Col2 Col3 Col4
A List 1 1 2 3 4
List 2 1 2 3 4
B List 1 1 2 3 4
List 2 1 2 3 4
C List 1 1 2 3 4
List 2 1 2 3 4
Code:
df = pd.DataFrame(columns=['Name', 'List 1', 'List 2'], data=[['A', [1,2,3,4], [1,2,3,4]], ['B', [1,2,3,4], [1,2,3,4]], ['C', [1,2,3,4], [1,2,3,4]]])
df.set_index('Name', inplace=True)
n_name = len(df.index)
n_list = len(df.columns)
n_item = len(df.iat[0, 0])
vals = list(df.values.reshape((n_list * n_name),))
idx_name = np.repeat(df.index.values, n_list)
idx_list = np.tile(df.columns.values, n_name)
columns = ['Col' + str(n) for n in list(range(1, n_item+1))]
df = pd.DataFrame(data=vals, index=[idx_name, idx_list], columns=columns)
Upvotes: 1