Marc-9
Marc-9

Reputation: 145

Pandas get count of value stored in an array in a column

I have a pandas data frame where one of the columns is an array of keywords, one row in the data frame would look like

id, jobtitle, company, url, keywords

1, Software Engineer, Facebook, http://xx.xx, [javascript, java, python]

However the number of possible keywords can range from 1 to 40

But I would like to do some data analysis like,

  1. what keyword appears most often across the whole dataset
  2. what keywords appear most often for each job title/company

Apart from giving each keyword its own column and dealing with lots of NAN values is there an easy way to answer these questions with python, (permeably pandas as its a dataframe)

Upvotes: 0

Views: 223

Answers (1)

GeorgesAA
GeorgesAA

Reputation: 153

You can do something like this :

import pandas as pd

keyword_dict = {}
def count_keywords(keyword):

    for item in keyword:
        if item in keyword_dict:
            keyword_dict[item] += 1
        else:
            keyword_dict[item] =1

def new_function():
    data = {'keywords':
            [['hello', 'test'], ['test', 'other'], ['test', 'hello']]
            }
    df = pd.DataFrame(data)
    df.keywords.map(count_keywords)
    
    print(keyword_dict)
    
if __name__ == '__main__':
    new_function()

output

{'hello': 2, 'test': 3, 'other': 1}

Upvotes: 1

Related Questions