Reputation: 119
I have a list of unicode string lists.
Each string list represents a different document with the strings representing the authors' names. Some documents have only one author while other documents can have multiple co-authors.
For example, a sample of authorship of three documents looks like this:
authors = [[u'Smith, J.', u'Williams, K.', u'Daniels, W.'], [u'Smith, J.'], [u'Williams, K.', u'Daniels, W.']]
I want to convert my list into a dictionary and list.
First, a dictionary that provides an integer key for each name:
author_name = {0: u'Smith, J.', 1: u'Williams, K.', 2: u'Daniels, W.'}
Second, a list that identifies the authors for each document by the integer key:
doc_author = [[0, 1, 2], [0], [1, 2]]
What is the most efficient way to create these?
FYI: I need my author data in this format to run a pre-built author-topic LDA algorithm written in Python.
Upvotes: 2
Views: 2350
Reputation: 971
### list of lists
authors = [[u'Smith, J.', u'Williams, K.', u'Daniels, W.'], [u'Smith, J.'], [u'Williams, K.', u'Daniels, W.']]
###flat lists
flat_list = [x for xs in authors for x in xs]
# print(flat_list)
### remove duplicates
res = [*set(flat_list)]
# print(res)
### create dict
dct = {}
for key, val in enumerate(res):
dct[key] = val
print(dct)
**output**
{0: 'Daniels, W.', 1: 'Williams, K.', 2: 'Smith, J.'}
Upvotes: 0
Reputation: 971
lst=['person', 'bicycle', 'car', 'motorbike', 'bus', 'truck' ]
dct = {}
for key, val in enumerate(lst):
dct[key] = val
print(dct)
***output***
{0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorbike', 4: 'bus', 5: 'truck'}
Upvotes: 0
Reputation: 1121416
You need to invert your author_name
dictionary; after that the conversion of your list is trivial, using a nested list comprehension:
author_to_id = {name: id for id, name in author_name.items()}
doc_author = [[author_to_id[name] for name in doc] for doc in authors]
Demo:
>>> authors = [[u'Smith, J.', u'Williams, K.', u'Daniels, W.'], [u'Smith, J.'], [u'Williams, K.', u'Daniels, W.']]
>>> author_name = {0: u'Smith, J.', 1: u'Williams, K.', 2: u'Daniels, W.'}
>>> author_to_id = {name: id for id, name in author_name.items()}
>>> [[author_to_id[name] for name in doc] for doc in authors]
[[0, 1, 2], [0], [1, 2]]
Upvotes: 3