Reputation: 167
I have a DF that looks like this
DF =
index goal features
0 1 [[5.20281045, 5.3353545, 7.343434, ...],[2.33435, 4.2133, ...], ...]]
1 0 [[7.23123213, 1.2323123, 2.232133, ...],[1,45456, 0.2313, 2.23213], ...]]
...
The features column has a very large amount of numbers in a list of lists. The actual amount of its elements is not the same across multiple rows and I therefore wanted to fill in 0 to create a singular input and also flattening the list of lists to a single list.
DF_Desired
index goal features
0 1 [5.20281045, 5.3353545, 7.343434, ..., 2.33435, 4.2133, ... , ...]
0 0 [7.23123213, 1.2323123, 2.232133, ..., 1,45456, 0.2313, 2.23213, ...]
Here is my code:
# Flatten each Lists
flat_list = []
for sublist in data["features"]:
for item in sublist:
flat_list.append(item)
or
flat_list = list(itertools.chain.from_iterable(data["features"]))
I (of course) cannot enter flat_list straight into the DF as its length does not match "ValueError: Length of values (478) does not match length of index (2)"
# Make the Lists equal in length:
length = max(map(len, df["features"]))
X = np.array([xi+[0]*(length-len(xi)) for xi in df["features"])
print(X)
What this should do is flatten each cell of df["features"] into a single list and then adding 0 to fit each list where needed. But it just returns:
[[5.20281045, 5.3353545, 7.343434, ...]
[2.33435, 4.2133, ...]
[...]
...
[7.23123213, 1.2323123, 2.232133, ...]
[1,45456, 0.2313, 2.23213 ...]]
So what exactly did I do wrong?
Upvotes: 0
Views: 1347
Reputation: 3926
If I understood correctly you want to flatten the list of lists into one list and also want each entry in features column to be of equal length.
This can be achieved in the following manner:
# flattening
df.features = df.features.apply(lambda x:[leaf for tree in x for leaf in tree])
# make equal in length
max_len = df.features.apply(len).max()
def append_zeros(l):
if len(l) < max_len:
return l.append([0]*(max_len - len(l))).copy()
else:
return l
df.features = df.features.apply(append_zeros)
If I have not understood something clearly, please comment.
Upvotes: 3
Reputation: 4929
You can sum each list with a empty one to get a flat list:
DF['features'] = DF.features.apply(lambda x: sum(x, []))
Upvotes: 3