Sahil Dahiya
Sahil Dahiya

Reputation: 721

Converting list in panda dataframe into columns

city        state   neighborhoods       categories
Dravosburg  PA      [asas,dfd]          ['Nightlife']
Dravosburg  PA      [adad]              ['Auto_Repair','Automotive']

I have above dataframe I want to convert each element of a list into column for eg:

city        state asas dfd adad Nightlife Auto_Repair Automotive 
Dravosburg  PA    1     1   0   1         1           0    

I am using following code to do this :

def list2columns(df):
"""
to convert list in the columns 
of a dataframe
"""
columns=['categories','neighborhoods']
for col in columns:    
    for i in range(len(df)):
        for element in eval(df.loc[i,"categories"]):
            if len(element)!=0:
                if element not in df.columns:
                    df.loc[:,element]=0
                else:
                    df.loc[i,element]=1
  1. How to do this in more efficient way?
  2. Why still there is below warning when I am using df.loc already

    SettingWithCopyWarning: A value is trying to be set on a copy of a slice
    from a DataFrame.Try using .loc[row_indexer,col_indexer] = value instead
    

Upvotes: 0

Views: 1230

Answers (2)

Matthias Fripp
Matthias Fripp

Reputation: 18625

Since you're using eval(), I assume each column has a string representation of a list, rather than a list itself. Also, unlike your example above, I'm assuming there are quotes around the items in the lists in your neighborhoods column (df.iloc[0, 'neighborhoods'] == "['asas','dfd']"), because otherwise your eval() would fail.

If this is all correct, you could try something like this:

def list2columns(df):
"""
to convert list in the columns of a dataframe
"""
columns = ['categories','neighborhoods']
new_cols = set()      # list of all new columns added
for col in columns:    
    for i in range(len(df[col])):
        # get the list of columns to set
        set_cols = eval(df.iloc[i, col])
        # set the values of these columns to 1 in the current row
        # (if this causes new columns to be added, other rows will get nans)
        df.iloc[i, set_cols] = 1
        # remember which new columns have been added
        new_cols.update(set_cols)
# convert any un-set values in the new columns to 0
df[list(new_cols)].fillna(value=0, inplace=True)
# if that doesn't work, this may:
# df.update(df[list(new_cols)].fillna(value=0))

I can only speculate on an answer to your second question, about the SettingWithCopy warning.

It's possible (but unlikely) that using df.iloc instead of df.loc will help, since that is intended to select by row number (in your case, df.loc[i, col] only works because you haven't set an index, so pandas uses the default index, which matches the row number).

Another possibility is that the df that is passed in to your function is already a slice from a larger dataframe, and that is causing the SettingWithCopy warning.

I've also found that using df.loc with mixed indexing modes (logical selectors for rows and column names for columns) produces the SettingWithCopy warning; it's possible that your slice selectors are causing similar problems.

Hopefully the simpler and more direct indexing in the code above will solve any of these problems. But please report back (and provide code to generate df) if you are still seeing that warning.

Upvotes: 2

piRSquared
piRSquared

Reputation: 294278

Use this instead

def list2columns(df):
    """
    to convert list in the columns 
    of a dataframe
    """
    df = df.copy()
    columns=['categories','neighborhoods']
    for col in columns:    
        for i in range(len(df)):
            for element in eval(df.loc[i,"categories"]):
                if len(element)!=0:
                    if element not in df.columns:
                        df.loc[:,element]=0
                    else:
                        df.loc[i,element]=1
    return df

Upvotes: 2

Related Questions