Reyhan
Reyhan

Reputation: 105

How can I create a new category in pandas data frame based on my previous categories?

I have a pandas data frame which looks like this:

          Tweets                      negative_keywords positive_keywords neutral_keywords
0   Şanlıurfa'da DAEŞ ile                       []            []             [neutral]
1   Hacettepe Üni. Araştırması                  []            []             [neutral]
2   Kadına şiddetin suç olduğu                [suç]           []                []
3   Suriyeli'lerin fal bakabilme                []            []             [neutral]
4   Hastaneye git Suriyeli. PTT ye          [Plaja]         [kardeşi]           []

By looking at the value of those three categories, I want to have a fourth category which looks like this:

          Tweets                      negative_keywords positive_keywords keyword_category  keyword_category
0   Şanlıurfa'da DAEŞ ile                       []            []             [neutral]       neutral
1   Hacettepe Üni. Araştırması                  []            []             [neutral]       neutral 
2   Kadına şiddetin suç olduğu                [suç]           []                []           negative
3   Suriyeli'lerin fal bakabilme                []            []             [neutral]       neutral
4   Hastaneye git Suriyeli. PTT ye              []         [kardeşi]           []            positive

So, if there is a keyword in the positive_keywords category, we should write [positive] in the keyword category. If there is a keyword in negative_keywords category, we should write [negative] in the keyword category. I also do not want those words to have brackets [] around them.

Upvotes: 2

Views: 517

Answers (1)

Haliaetus
Haliaetus

Reputation: 490

I'd write a function that evaluates a single line in your df, then apply it to each line using the pandas.DataFrame.apply function, while also specifying the new column.

def classify(item):
    if len(item["negative_keywords"]) != "[]":
        return "negative"
    if len(item["positive_keywords"]) != "[]":
        return "positive"
    if len(item["neutral_keywords"]) != "[]":
        return "neutral"
    return 0  # what if none are true? or if multiple are true?

df["keyword_category"] = df.apply(classify, axis=1)

Upvotes: 1

Related Questions