Reputation: 2150
I have a Dataframe like this:
df = pd.DataFrame({'text':['No thank you', 'They didnt respond me'],
'pred':['positive', 'negative'],
'score':["[[0, 0, 1], [1, 0, 2], [1, 0, 0]]", "[[], [0, 1, 0], [], []]"]
})
(Its a string but we can convert it to list this way from ast import literal_eval. df["score"] = df["score"].apply(literal_eval)
)
which looks like this:
text, pred score
No thank you. positive [[0, 0, 1], [1, 0, 2], , [1, 0, 0]]]
They didn't respond me negative [[], [0, 1, 0], [], []]
The score
is a 2d list, in which the first element is for positive
second element for negative
and third element neutral
.
What I want is if the pred=positive
count the number of list in the score
that are non zero and non empty
. The same logic for negative
and neutral
.
So the result will look like this:
text, pred score count
No thank you. positive [[0, 0, 1], [1, 0, 2], [1, 0, 0]]] 2
They didn't respond me negative [[], [0, 1, 0], [], []] 1
because in the first row the pred=positive
and two elements in the first position of score
are non empty non zero, and the same applies to negative
.
What I have done so far:
m_sum = {"positive": 0, "negative": 1, "neutral": 2}
df["count"] = df.apply(
lambda x: count(v[m_sum[x["pred"]]] for v in x["score"] if v and v!=0),
axis=1)
But cannot use count
this way.
Thanks.
Upvotes: 1
Views: 124
Reputation: 120479
Just replace '[]' by '[0, 0, 0]' and compute the sum and get the right column according 'pred' column:
get_cnt = lambda x: np.sum(ast.literal_eval(x['score']), axis=0)[m_sum[x['pred']]]
df['count'] = df.replace({'score': {r'\[\]': '[0, 0, 0]'}}, regex=True) \
.apply(get_cnt, axis=1)
Output:
>>> df
text pred score count
0 No thank you positive [[0, 0, 1], [1, 0, 2], [1, 0, 0]] 2
1 They didnt respond me negative [[], [0, 1, 0], [], []] 1
Upvotes: 1
Reputation: 195543
You can use sum()
to count number of non-zero elements:
# if not converted already, convert the "score" column to list:
# from ast import literal_eval
# df["score"] = df["score"].apply(literal_eval)
m_sum = {"positive": 0, "negative": 1, "neutral": 2}
df["count"] = df.apply(
lambda x: sum(v[m_sum[x["pred"]]] != 0 for v in x["score"] if v),
axis=1,
)
print(df)
Prints:
text pred score count
0 No thank you positive [[0, 0, 1], [1, 0, 2], [1, 0, 0]] 2
1 They didnt respond me negative [[], [0, 1, 0], [], []] 1
Upvotes: 2