Reputation: 721
city state neighborhoods categories
Dravosburg PA [asas,dfd] ['Nightlife']
Dravosburg PA [adad] ['Auto_Repair','Automotive']
I have above dataframe I want to convert each element of a list into column for eg:
city state asas dfd adad Nightlife Auto_Repair Automotive
Dravosburg PA 1 1 0 1 1 0
I am using following code to do this :
def list2columns(df):
"""
to convert list in the columns
of a dataframe
"""
columns=['categories','neighborhoods']
for col in columns:
for i in range(len(df)):
for element in eval(df.loc[i,"categories"]):
if len(element)!=0:
if element not in df.columns:
df.loc[:,element]=0
else:
df.loc[i,element]=1
Why still there is below warning when I am using df.loc already
SettingWithCopyWarning: A value is trying to be set on a copy of a slice
from a DataFrame.Try using .loc[row_indexer,col_indexer] = value instead
Upvotes: 0
Views: 1230
Reputation: 18625
Since you're using eval()
, I assume each column has a string representation of a list, rather than a list itself. Also, unlike your example above, I'm assuming there are quotes around the items in the lists in your neighborhoods
column (df.iloc[0, 'neighborhoods'] == "['asas','dfd']"
), because otherwise your eval() would fail.
If this is all correct, you could try something like this:
def list2columns(df):
"""
to convert list in the columns of a dataframe
"""
columns = ['categories','neighborhoods']
new_cols = set() # list of all new columns added
for col in columns:
for i in range(len(df[col])):
# get the list of columns to set
set_cols = eval(df.iloc[i, col])
# set the values of these columns to 1 in the current row
# (if this causes new columns to be added, other rows will get nans)
df.iloc[i, set_cols] = 1
# remember which new columns have been added
new_cols.update(set_cols)
# convert any un-set values in the new columns to 0
df[list(new_cols)].fillna(value=0, inplace=True)
# if that doesn't work, this may:
# df.update(df[list(new_cols)].fillna(value=0))
I can only speculate on an answer to your second question, about the SettingWithCopy warning.
It's possible (but unlikely) that using df.iloc
instead of df.loc
will help, since that is intended to select by row number (in your case, df.loc[i, col]
only works because you haven't set an index, so pandas uses the default index, which matches the row number).
Another possibility is that the df
that is passed in to your function is already a slice from a larger dataframe, and that is causing the SettingWithCopy warning.
I've also found that using df.loc
with mixed indexing modes (logical selectors for rows and column names for columns) produces the SettingWithCopy warning; it's possible that your slice selectors are causing similar problems.
Hopefully the simpler and more direct indexing in the code above will solve any of these problems. But please report back (and provide code to generate df
) if you are still seeing that warning.
Upvotes: 2
Reputation: 294278
Use this instead
def list2columns(df):
"""
to convert list in the columns
of a dataframe
"""
df = df.copy()
columns=['categories','neighborhoods']
for col in columns:
for i in range(len(df)):
for element in eval(df.loc[i,"categories"]):
if len(element)!=0:
if element not in df.columns:
df.loc[:,element]=0
else:
df.loc[i,element]=1
return df
Upvotes: 2