Reputation: 105
I have a pandas data frame which looks like this:
Tweets negative_keywords positive_keywords neutral_keywords
0 Şanlıurfa'da DAEŞ ile [] [] [neutral]
1 Hacettepe Üni. Araştırması [] [] [neutral]
2 Kadına şiddetin suç olduğu [suç] [] []
3 Suriyeli'lerin fal bakabilme [] [] [neutral]
4 Hastaneye git Suriyeli. PTT ye [Plaja] [kardeşi] []
By looking at the value of those three categories, I want to have a fourth category which looks like this:
Tweets negative_keywords positive_keywords keyword_category keyword_category
0 Şanlıurfa'da DAEŞ ile [] [] [neutral] neutral
1 Hacettepe Üni. Araştırması [] [] [neutral] neutral
2 Kadına şiddetin suç olduğu [suç] [] [] negative
3 Suriyeli'lerin fal bakabilme [] [] [neutral] neutral
4 Hastaneye git Suriyeli. PTT ye [] [kardeşi] [] positive
So, if there is a keyword in the positive_keywords category, we should write [positive] in the keyword category. If there is a keyword in negative_keywords category, we should write [negative] in the keyword category. I also do not want those words to have brackets [] around them.
Upvotes: 2
Views: 517
Reputation: 490
I'd write a function that evaluates a single line in your df, then apply it to each line using the pandas.DataFrame.apply
function, while also specifying the new column.
def classify(item):
if len(item["negative_keywords"]) != "[]":
return "negative"
if len(item["positive_keywords"]) != "[]":
return "positive"
if len(item["neutral_keywords"]) != "[]":
return "neutral"
return 0 # what if none are true? or if multiple are true?
df["keyword_category"] = df.apply(classify, axis=1)
Upvotes: 1