Gaz Zhao
Gaz Zhao

Reputation: 39

Transpose a dataframe to a nested list of list

I got situation where I need to transpose a dataframe like below.

input dataframe is as below:

input_data = [
    ['Asia', 'China', 'Beijing'],
    ['Asia', 'China', 'Shenzhen'],
    ['America', 'United States', 'New York'],
    ['America', 'Canada', 'Toronto']
]
input_df = pd.DataFrame(input_data)
input_df.columns = ['continents', 'countries', 'cities']
input_df
continents countries cities
0 Asia China Beijing
1 Asia China Shenzhen
2 America United States New York
3 America Canada Toronto

The output data I want to get is

# only the unique values are allowed in the output list.
continents = ['Asia', 'America']
countries = [['China'], ['United States', 'Canada']]
cities = [[['Beijing', 'Shenzhen']], [['New York'], ['Toronto']]]

For this case, the input data has three levels Continents -> Countries -> Cities, but what I ultimately want is to take a multiple-level hierarchical dataframe (no matters how deep it is horizontally), then I get the output like the example, and then I will put them on a pyqt5 column view.

Upvotes: 0

Views: 108

Answers (1)

Ynjxsjmh
Ynjxsjmh

Reputation: 30022

pandas.Series.tolist() can convert series value to list.

print(input_df['continents'].unique().tolist())
print(input_df.groupby('continents', sort=False)['countries'].apply(lambda x: x.unique().tolist()).tolist())
print(input_df.groupby(['continents', 'countries'], sort=False)['cities'].apply(lambda x: [x.unique().tolist()]).tolist())
['Asia', 'America']
[['China'], ['United States', 'Canada']]
[[['Beijing', 'Shenzhen']], [['New York']], [['Toronto']]]

As for a general approach, the first approach occurred to me is to loop through the columns of df.

def list_wrapper(alist, times):
    for _ in range(times):
        alist = [alist]
    return alist

columns_name = input_df.columns.values.tolist()
for i in range(len(columns_name)):
    if i == 0:
        print(input_df[columns_name[i]].unique().tolist())
    else:
        print(input_df.groupby(columns_name[0:i], sort=False)[columns_name[i]].apply(lambda x: list_wrapper(x.unique().tolist(), i-1)).tolist())

Upvotes: 2

Related Questions