Reputation: 4790
I have a dataframe
df = pd.DataFrame({'Binary_List': [[0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1]]})
df
Binary_List
0 [0, 0, 1, 0, 0, 0, 0]
1 [0, 1, 0, 0, 0, 0, 0]
2 [0, 0, 1, 1, 0, 0, 0]
3 [0, 0, 0, 0, 1, 1, 1]
I want to apply a function to each list, without use of apply
because apply
is very slow when running on large dataset
def count_one(lst):
index = [i for i, e in enumerate(lst) if e != 0]
# some more steps
return len(index)
df['Value'] = df['Binary_List'].apply(lambda x: count_one(x))
df
Binary_List Value
0 [0, 0, 1, 0, 0, 0, 0] 1
1 [0, 1, 0, 0, 0, 0, 0] 1
2 [0, 0, 1, 1, 0, 0, 0] 2
3 [0, 0, 0, 0, 1, 1, 1] 3
I tried using this, but no improvement
vfunc = np.vectorize(count_one)
df['Value'] = vfunc(df['Binary_List'])
This gives me error
df['Value'] = count_one(df['Binary_List'])
Upvotes: 1
Views: 353
Reputation: 5451
for getting length of list items you can use str function like below
df = pd.DataFrame({'Binary_List': [[0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1]]})
df["Binary_List"].astype(np.str).str.count("1")
Upvotes: 1
Reputation: 30920
you can try DataFrame.explode:
df.explode('Binary_List').reset_index().groupby('index').sum()
Binary_List
index
0 1
1 1
2 2
3 3
Also you can do:
pd.Series([np.array(key).sum() for key in df['Binary_List']])
0 1
1 1
2 2
3 3
dtype: int64
Upvotes: 1