Reputation: 127
I have a pandas dataframe where one column RESULT has list inside a list.
ID RESULT
0 A [nan, ['PASS'], nan, nan]
1 B [['FAIL'], nan, nan, nan]
2 C [['PASS'], nan, nan, nan]
3 D [nan, nan, nan, nan]
4 E [nan, ['FAIL'], nan, nan]
I want to make the RESULT column a flat list. For example the first case would be [nan, 'PASS', nan, nan]. Final answer should look like below.
ID RESULT
0 A [nan, 'PASS', nan, nan]
1 B ['FAIL', nan, nan, nan]
2 C ['PASS', nan, nan, nan]
3 D [nan, nan, nan, nan]
4 E [nan, 'FAIL', nan, nan]
I tried to create a function but it is not updating the column to a flat list. Below is the code I tried.
def flatten_list(mylist):
# print(mylist)
for index, value in enumerate(mylist):
if type(value) is list:
mylist[index] = value[0]
# print(mylist)
return mylist
df_bin['RESULT'] = df_bin['RESULT'].apply(flatten_list)
But if I try a simple example below it works. I wonder what is the difference. I will appreciate any guidance. Also is it possible to use lambda function to achieve the same result.
mylist = [nan, ['PASS'], nan, nan]
for n, i in enumerate(mylist):
if type(i) is list:
mylist[n] = i[0]
print(mylist)
Upvotes: 4
Views: 4390
Reputation: 14096
You're almost there, you have to unindent the return statement
def flatten_list(mylist):
# print(mylist)
for index, value in enumerate(mylist):
if type(value) is list:
mylist[index] = value[0]
# print(mylist)
return mylist # <- indentation issue here.
Here is a more general solution if your sublist contains more than one item.
def flatten_list(cell):
fcell = []
for item in cell:
if isinstance(item, list):
fcell += item
else:
fcell += [item]
return fcell
df_bin['RESULT'] = df_bin['RESULT'].apply(flatten_list)
Upvotes: 1
Reputation: 953
A more efficient way of doing this (if you care about performance) is avoiding a loop and using numpy.hstack instead. Here is an example.
from numpy import hstack, nan
lst= [nan, ['PASS'], nan, nan]
lst2 = list(hstack(lst))
print(lst2)
Output:
['nan', 'PASS', 'nan', 'nan']
Upvotes: 1
Reputation: 572
It is possible to do this using an internal flatten
function from pandas.core
import pandas as pd
from pandas.core.common import flatten
df = pd.DataFrame({'ID':['A','B'],
'Result':[['nan', ['PASS'], 'nan', 'nan'], [['FAIL'], 'nan', 'nan', 'nan']]
})
df['Result'] = df['Result'].apply(lambda x: list(flatten(x)))
Output:
ID Result
0 A [nan, PASS, nan, nan]
1 B [FAIL, nan, nan, nan]
Based on your example, I guess this should work.
Upvotes: 4