Reputation: 51
I have an assignment that has a data mining element. I need to find which authors collaborate the most across several publication webpages.
I've scraped the webpages and compiled the author text into a list.
My current output looks like this:
for author in list:
print(author)
##output :
['Author 1', 'Author 2', 'Author 3']
['Author 2', 'Author 4', 'Author 1']
['Author 1', 'Author 5', 'Author 6', 'Author 7', 'Author 4']
etc for ~100 more rows.
My idea is, for in each section of the list, to produce another list that contains each of the unique pairs in that list. E.g. the third demo row would give 'Author 1 + Author 5', 'Author 1 + Author 6', 'Author 1 + Author 7', 'Author 1 + Author 4', 'Author 5 + Author 6', 'Author 5 + Author 7', 'Author 5 + Author 4', 'Author 6 + Author 7', 'Author 6 + Author 4', 'Author 7 + Author 4'. Then I'd append these pairs lists to one large list and put it through a counter to see which pairs came up the most.
The problem is I'm just not sure how to actually implement that pair matcher, so if anyone has any pointers that would be great. I'm sure it can't be that complicated an answer, but I've been unable to find it. Alternative ideas on how to measure collaboration would be good too.
Upvotes: 2
Views: 666
Reputation: 1373
It seems like you want to generate all subsets of size 2 for a given list. itertools
will do just that:
import itertools
for author in lists:
a = list(itertools.combinations(author, 2))
print(a)
Upvotes: 3
Reputation: 646
You could simply generate all pairs from the list with itertools
, allowing you to compute the cartesian product of a list with itself :
import itertools
a = ['Author 1', 'Author 5', 'Author 6', 'Author 7', 'Author 4']
list(itertools.product(a, a))
Upvotes: 0
Reputation: 6156
You could use a dictionary where the pair is the key and the number how often it occurs is the value. You'll need to make sure that you always generate the same key for (Author1,Author2)
and (Author2, Author1)
but you could choose alphabetic ordering for dealing with that.
Then you simply increment the number stored for the pair whenever you encounter it.
Upvotes: 0