Reputation: 674
How do I return the column name with the highest count of value "GPE"? In this case I want my output to just be "text" because that column has two rows of 'GPE' while column text2 has 1 and column text3 has 0.
Code:
import spacy
import pandas as pd
import en_core_web_sm
nlp = en_core_web_sm.load()
text = [["Canada", 'University of California has great research', "non-location"],["China", 'MIT is at Boston', "non-location"]]
df = pd.DataFrame(text, columns = ['text', 'text2', 'text3'])
col_list = df.columns # obtains the columns of the dataframe
for col in col_list:
df["".join(col)] = df[col].apply(lambda x: [[w.label_] for w in list(nlp(x).ents)]) # combine the ent_<<col_name>> as the new columns which contain the named entities.
df
Desired output:
text
Upvotes: 0
Views: 81
Reputation: 436
You can use values_count()
.
Your code should look like:
import spacy
import pandas as pd
import en_core_web_sm
nlp = en_core_web_sm.load()
text = [["Canada", 'University of California has great research', "non-location"],["China", 'MIT is at Boston', "non-location"]]
df = pd.DataFrame(text, columns = ['text', 'text2', 'text3'])
col_list = df.columns # obtains the columns of the dataframe
maxGPE = 0
index = ''
for col in col_list:
newGPE = df[col].apply(lambda x: x in nlp(x).ents).sum()
if newGPE > maxGPE:
index = col
maxGPE = newGPE
print(index)
Upvotes: 2