Convert lists present in each column to its respective datatypes

Question

I have a sample dataframe as given below.

import pandas as pd

data = {'ID':['A', 'B', 'C', 'D],
    'Age':[[20], [21], [19], [24]],
    'Sex':[['Male'], ['Male'],['Female'], np.nan],
    'Interest': [['Dance','Music'], ['Dance','Sports'], ['Hiking','Surfing'], np.nan]}


df = pd.DataFrame(data)
df

Each of the columns are in list datatype. I want to remove those lists and preserve the datatypes present within the lists for all columns.

The final output should look something shown below.

Any help is greatly appreciated. Thank you.

Peter Leimbigler · Accepted Answer

Option 1. You can use the .str column accessor to index the lists stored in the DataFrame values (or strings, or any other iterable):

# Replace columns containing length-1 lists with the only item in each list
df['Age'] = df['Age'].str[0]
df['Sex'] = df['Sex'].str[0]

# Pass the variable-length list into the join() string method
df['Interest'] = df['Interest'].apply(', '.join)

Option 2. explode Age and Sex, then apply ', '.join to Interest:

df = df.explode(['Age', 'Sex'])
df['Interest'] = df['Interest'].apply(', '.join)

Both options return:

df

  ID  Age     Sex         Interest
0  A   20    Male     Dance, Music
1  B   21    Male    Dance, Sports
2  C   19  Female  Hiking, Surfing

EDIT

Option 3. If you have many columns which contain lists with possible missing values as np.nan, you can get the list-column names and then loop over them as follows:

# Get columns which contain at least one python list

list_cols = [c for c in df 
             if df[c].apply(lambda x: isinstance(x, list)).any()]
list_cols

['Age', 'Sex', 'Interest']


# Process each column

for c in list_cols:
    # If all lists in column c contain a single item:
    if (df[c].str.len() == 1).all():
        df[c] = df[c].str[0]
    else:
        df[c] = df[c].apply(', '.join)

Convert lists present in each column to its respective datatypes

Answers (1)

EDIT

Related Questions