Reputation: 145
I have a pandas data frame where one of the columns is an array of keywords, one row in the data frame would look like
id, jobtitle, company, url, keywords
1, Software Engineer, Facebook, http://xx.xx, [javascript, java, python]
However the number of possible keywords can range from 1 to 40
But I would like to do some data analysis like,
Apart from giving each keyword its own column and dealing with lots of NAN values is there an easy way to answer these questions with python, (permeably pandas as its a dataframe)
Upvotes: 0
Views: 223
Reputation: 153
You can do something like this :
import pandas as pd
keyword_dict = {}
def count_keywords(keyword):
for item in keyword:
if item in keyword_dict:
keyword_dict[item] += 1
else:
keyword_dict[item] =1
def new_function():
data = {'keywords':
[['hello', 'test'], ['test', 'other'], ['test', 'hello']]
}
df = pd.DataFrame(data)
df.keywords.map(count_keywords)
print(keyword_dict)
if __name__ == '__main__':
new_function()
output
{'hello': 2, 'test': 3, 'other': 1}
Upvotes: 1