Mike L
Mike L

Reputation: 47

Create a DataFrame in Pandas with variable number of columns

Good afternoon, Here is my problem: I have a Pandas dataframe that looks like it:

Birth_Year      Name
1964            [X, Y]
1965            [A, B]
1964            [F, G, H]

I need to transform it to the following:

Birth_Year      Name_1          Name_2         Name_3
1964            X               Y
1965            A               B
1964            F               G              H

So I need to get the max length of lists in column 'Name', create the datafame with column for each item in the list, and populate it with the values from the list.

Would really appreciate an advice on how to handle this problem.

I have no good solution for it.

Upvotes: 1

Views: 1039

Answers (4)

Azhar Khan
Azhar Khan

Reputation: 4108

One way of implementing this is by splitting array and returning multiple columns form pd.apply:

df.apply(lambda row: [row["Name"][i-1] if i <= len(row["Name"]) else "" for i in range(1, max_len+1)], axis=1)

Full working example (with corner-case tested when array is empty):

data=[
  [1964,["X", "Y"]],
  [1965,["A", "B"]],
  [1964,["F", "G", "H"]],
  [1966,["P"]],
  [1967,[]],
]

columns = ["Birth_Year","Name"]

df = pd.DataFrame(data=data, columns=columns)

max_len = df["Name"].apply(len).max()

df[[f"Name_{i}" for i in range(1, max_len+1)]] = df.apply(lambda row: [row["Name"][i-1] if i <= len(row["Name"]) else "" for i in range(1, max_len+1)], axis=1, result_type="expand")

[Out]:
   Birth_Year        Name  Name_1  Name_2  Name_3
0        1964      [X, Y]       X       Y        
1        1965      [A, B]       A       B        
2        1964   [F, G, H]       F       G       H
3        1966         [P]       P                
4        1967          []                        

Upvotes: 0

Anoushiravan R
Anoushiravan R

Reputation: 21938

You can also try this:

df = pd.concat([df['Birth_Year'],
                (df['Name'].apply(lambda x: pd.Series([i for i in x]))
                 .pipe(lambda df2: df2.set_axis([f'Name_{i}' for i in range(1, len(df2.columns)+1)], axis=1)))
                ], axis=1)
df

   Birth_Year Name_1 Name_2 Name_3
0        1964      X      Y    NaN
1        1965      A      B    NaN
2        1964      F      G      H

Upvotes: 0

ak_slick
ak_slick

Reputation: 1016

To get close to your data example:

import pandas as pd

df = pd.DataFrame(index=[1964, 1965, 1964],
                  data={'Name': [['X', 'Y'], ['A', 'B'], ['F', 'G', 'H']]})
df.index.name = 'Birth Year'

Solution:

df = df.Name.str.join(',').str.split(',', expand=True).fillna('')
df = df.rename(columns=lambda c: f'Name_{c+1}')

I found relying on the string methods to make this pretty easy to follow. But I think mozway's answer should be preferred. I believe it to be more efficient and faster.

Output:

           Name_1 Name_2 Name_3
Birth Year                     
1964            X      Y       
1965            A      B       
1964            F      G      H

Upvotes: 0

mozway
mozway

Reputation: 261850

Assuming you have real lists in "Name", you can use:

df = df.join(pd.DataFrame(df.pop('Name').tolist(), index=df.index)
               .fillna('')
               .rename(columns=lambda c: f'Name_{c+1}')
             )

Output:

   Birth_Year Name_1 Name_2 Name_3
0        1964      X      Y       
1        1965      A      B       
2        1964      F      G      H

Upvotes: 2

Related Questions