sariii
sariii

Reputation: 2150

How to use count over 2d list in pandas

I have a Dataframe like this:

df = pd.DataFrame({'text':['No thank you', 'They didnt respond me'],
                   'pred':['positive', 'negative'],
                   'score':["[[0, 0, 1], [1, 0, 2], [1, 0, 0]]", "[[], [0, 1, 0], [], []]"] 
                   })

(Its a string but we can convert it to list this way from ast import literal_eval. df["score"] = df["score"].apply(literal_eval))

which looks like this:

text,                  pred                 score                            
No thank you.          positive    [[0, 0, 1], [1, 0, 2], , [1, 0, 0]]]      
They didn't respond me negative    [[], [0, 1, 0], [], []]                

The score is a 2d list, in which the first element is for positive second element for negative and third element neutral.

What I want is if the pred=positive count the number of list in the score that are non zero and non empty. The same logic for negative and neutral.

So the result will look like this:

text,                  pred                 score                        count                         
No thank you.          positive    [[0, 0, 1], [1, 0, 2], [1, 0, 0]]]      2  
They didn't respond me negative    [[], [0, 1, 0], [], []]                 1

because in the first row the pred=positive and two elements in the first position of score are non empty non zero, and the same applies to negative.

What I have done so far:

m_sum = {"positive": 0, "negative": 1, "neutral": 2}
df["count"] = df.apply(
    lambda x: count(v[m_sum[x["pred"]]] for v in x["score"] if v and v!=0),
    axis=1)

But cannot use count this way.

Thanks.

Upvotes: 1

Views: 124

Answers (2)

Corralien
Corralien

Reputation: 120479

Just replace '[]' by '[0, 0, 0]' and compute the sum and get the right column according 'pred' column:

get_cnt = lambda x: np.sum(ast.literal_eval(x['score']), axis=0)[m_sum[x['pred']]]

df['count'] = df.replace({'score': {r'\[\]': '[0, 0, 0]'}}, regex=True) \
                .apply(get_cnt, axis=1)

Output:

>>> df
                    text      pred                              score  count
0           No thank you  positive  [[0, 0, 1], [1, 0, 2], [1, 0, 0]]      2
1  They didnt respond me  negative            [[], [0, 1, 0], [], []]      1

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195543

You can use sum() to count number of non-zero elements:

# if not converted already, convert the "score" column to list:
# from ast import literal_eval
# df["score"] = df["score"].apply(literal_eval)

m_sum = {"positive": 0, "negative": 1, "neutral": 2}

df["count"] = df.apply(
    lambda x: sum(v[m_sum[x["pred"]]] != 0 for v in x["score"] if v),
    axis=1,
)
print(df)

Prints:

                    text      pred                              score  count
0           No thank you  positive  [[0, 0, 1], [1, 0, 2], [1, 0, 0]]      2
1  They didnt respond me  negative            [[], [0, 1, 0], [], []]      1

Upvotes: 2

Related Questions