Reputation: 713
I have a file with following contents.
1234:yahoo\tgoogle\tmicrosoft\tapple\tyahoo
2345:apple\tgoogle\tgoogle
4567:yahoo\tapple\tapple
I am interested in getting the output
"Output"--> searchTerm : UserCnt, searchCnt
yahoo: 2, 3
apple: 3, 4
and so on...
fname="/tmp/sample.txt"
with open(fname) as f:
content = f.readlines()
value = [ i.strip().split(':') for i in content ]
dict = {k:v.split('\t') for k,v in value}
d = defaultdict(int)
for k,v in dict.items():
for name in v:
d[name] +=1
print k,d
But, how do I get user count and search count for each search term.
Upvotes: 0
Views: 49
Reputation: 1639
Yes, you can use a defaultdict
to do this (or just a regular dict
too, but I think a defaultdict
is more flexible)
In [36]: a = defaultdict(defaultdict)
In [40]: l = ["1234:yahoo\tgoogle\tmicrosoft\tapple\tyahoo", "2345:apple\tgoogle\tgoogle", "4567:yahoo\tapple\tapple"]
In [48]: for li in l:
...: search_id, terms = li.split(":")[0], li.split(":")[1]
...: terms = terms.split("\t")
...: for term in terms:
...: if "search_cnt" in a[term]:
...: a[term]["search_cnt"] += 1
...: else:
...: a[term]["search_cnt"] = 1
...: for term in set(terms):
...: if "user_cnt" in a[term]:
...: a[term]["user_cnt"] += 1
...: else:
...: a[term]["user_cnt"] = 1
In [49]: a
Out[49]:
defaultdict(collections.defaultdict,
{'apple': defaultdict(None, {'search_cnt': 4, 'user_cnt': 3}),
'google': defaultdict(None, {'search_cnt': 3, 'user_cnt': 2}),
'microsoft': defaultdict(None, {'search_cnt': 1, 'user_cnt': 1}),
'yahoo': defaultdict(None, {'search_cnt': 3, 'user_cnt': 2})})
The default dict above contains the counts you need.
The reason I use the set
for the second term iteration is that if 1 user searched for a term multiple times, the term's user count should not increment :)
Upvotes: 1