Reputation: 3
I scraped some data using beautifulsoup, and saved as .txt file. The data is movie reviews from IMDB.com I found a good word counting python code, so I could make a word frequency excel table. However, I could not draw graph just using frequency table.
I want to draw semantic network graph using UCINET (node size should be based on betweenness centrality.)
My question is how to make text file into adjacency matrix data to draw UCINET graph. like this http://www.umasocialmedia.com/socialnetworks/wp-content/uploads/2012/09/senatorsxsenators1.png I want to draw network graph using the words which is used from reviewers.
(calculate the frequency if two words came up in the same sentence, when they are matched row and column line)
Or. Could you tell me how to draw network graph (using betweenness Centrality) in Python Code??
Upvotes: 0
Views: 771
Reputation: 2304
Make a 2D 20x20 array, loop through each input string, and update your matrix using that string:
adjacency_matrix = [[0 for _ in range(20)] for _ in range(20)]
def get_lines(filename):
"""Returns the lines in the file"""
with open(filename, 'r') as fp:
return fp.readlines()
def update_matrix(matrix, mapping, string):
"""Update the given adjacency matrix using the given string."""
words = [_ for _ in re.split("\s+", string) if _ in mapping.keys()]
for word_1 in words:
for word_2 in words:
matrix[mapping[word_1]][mapping[word_2]] += 1
if __name__ == "__main__":
words_in_matrix = ["X-men", "awesome", "good", "bad", ... 16 more ...]
mapping = {word: index for index, word in enumerate(words_in_matrix)}
for line in get_lines("ibdb.txt"):
update_matrix(adjacency_matrix, mapping, line)
print(adjacency_matrix)
A function similar to update_matrix
may be useful, with matrix
as your adjacency matrix, mapping
a mapping of words to indices in your adjacency matrix, and string
your sample review.
You will need to modify this to your needs. Inputs may have periods or other noise characters, which will need to be stripped.
Upvotes: 1