Reputation: 349
I have a nested for loop something like:
for x in df['text']:
for i in x:
if i in someList:
count++
Where df['text']
is a series of lists containing words such as ['word1', 'word2', 'etc']
I know I can just use the for
format but I want to convert it into a lambda function.
I tried doing:
df['in'] = df['text'].apply(lambda x: [count++ for i in x if i in someList])
but it is not proper syntax. How can I modify to get the function to what I desire?
Upvotes: 2
Views: 7864
Reputation: 294258
someList = [*'ABCD']
df = pd.DataFrame(dict(text=[*map(list, 'AB CD AF EG BH IJ ACDE'.split())]))
df
text
0 [A, B]
1 [C, D]
2 [A, F]
3 [E, G]
4 [B, H]
5 [I, J]
6 [A, C, D, E]
__contains__
i = np.arange(len(df)).repeat(df.text.str.len())
a = np.zeros(len(df), int)
np.add.at(a, i, [*map(someList.__contains__, np.concatenate(df.text))])
df.assign(**{'in': a})
text in
0 [A, B] 2
1 [C, D] 2
2 [A, F] 1
3 [E, G] 0
4 [B, H] 1
5 [I, J] 0
6 [A, C, D, E] 3
map
lambda
and __contains__
df.assign(**{'in': df.text.map(lambda x: sum(map(someList.__contains__, x)))})
text in
0 [A, B] 2
1 [C, D] 2
2 [A, F] 1
3 [E, G] 0
4 [B, H] 1
5 [I, J] 0
6 [A, C, D, E] 3
Upvotes: 2
Reputation: 531065
You don't need any additional functions. Just create a sequences of ones (one per element) to sum.
count = sum(1 for x in df['text'] for i in x if i in someList)
Upvotes: 2
Reputation: 323226
I feel like you need expend the row and doing with isin
, since with pandas , we usually try not use for loop .
df['in']=pd.DataFrame(df['text'].tolist(),index=df.index).isin(someList).sum(1)
Upvotes: 4