Reputation: 591
Reposting again because i didn't get a response to the first post
I have the following data is below:
desc = pd.DataFrame(description, columns =['new_desc'])
new_desc
257623 the public safety report is compiled from crim...
161135 police say a sea isle city man ordered two pou...
156561 two people are behind bars this morning, after...
41690 pumpkin soup is a beloved breakfast soup in ja...
70092 right now, 15 states are grappling with how be...
... ...
207258 operation legend results in 59 more arrests, i...
222170 see story, 3a
204064 st. louis — missouri secretary of state jason ...
151443 tony lavell jones, 54, of sunset view terrace,...
97367 walgreens, on the other hand, is still going t...
[9863 rows x 1 columns]
I'm trying to find the dominant topic within the documents, and When I run the following code
best_lda_model = lda_desc
data_vectorized = tfidf
lda_output = best_lda_model.transform(data_vectorized)
topicnames = ["Topic " + str(i) for i in range(best_lda_model.n_components)]
docnames = ["Doc " + str(i) for i in range(len(dataset))]
df_document_topic = pd.DataFrame(np.round(lda_output, 2), columns = topicnames, index = docnames)
dominant_topic = np.argmax(df_document_topic.values, axis = 1)
df_document_topic['dominant_topic'] = dominant_topic
I've tried tweaking the code, however, no matter what I change, I get the following error tracebook error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
c:\python36\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
1673
-> 1674 mgr = BlockManager(blocks, axes)
1675 mgr._consolidate_inplace()
c:\python36\lib\site-packages\pandas\core\internals\managers.py in __init__(self, blocks, axes, do_integrity_check)
148 if do_integrity_check:
--> 149 self._verify_integrity()
150
c:\python36\lib\site-packages\pandas\core\internals\managers.py in _verify_integrity(self)
328 if block.shape[1:] != mgr_shape[1:]:
--> 329 raise construction_error(tot_items, block.shape[1:], self.axes)
330 if len(self.items) != tot_items:
ValueError: Shape of passed values is (9863, 8), indices imply (0, 8)
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-41-bd470d69b181> in <module>
4 topicnames = ["Topic " + str(i) for i in range(best_lda_model.n_components)]
5 docnames = ["Doc " + str(i) for i in range(len(dataset))]
----> 6 df_document_topic = pd.DataFrame(np.round(lda_output, 2), columns = topicnames, index = docnames)
7 dominant_topic = np.argmax(df_document_topic.values, axis = 1)
8 df_document_topic['dominant_topic'] = dominant_topic
c:\python36\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
495 mgr = init_dict({data.name: data}, index, columns, dtype=dtype)
496 else:
--> 497 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
498
499 # For data is list-like, or Iterable (will consume into list)
c:\python36\lib\site-packages\pandas\core\internals\construction.py in init_ndarray(values, index, columns, dtype, copy)
232 block_values = [values]
233
--> 234 return create_block_manager_from_blocks(block_values, [columns, index])
235
236
c:\python36\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
1679 blocks = [getattr(b, "values", b) for b in blocks]
1680 tot_items = sum(b.shape[0] for b in blocks)
-> 1681 raise construction_error(tot_items, blocks[0].shape[1:], axes, e)
1682
1683
ValueError: Shape of passed values is (9863, 8), indices imply (0, 8)
The desired results is to produce a list of documents according to a specific topic. Below is example code and desired output.
df_document_topic(df_document_topic['dominant_topic'] == 2).head(10)
When I run this code, I get the following traceback
TypeError Traceback (most recent call last)
<ipython-input-55-8cf9694464e6> in <module>
----> 1 df_document_topic(df_document_topic['dominant_topic'] == 2).head(10)
TypeError: 'DataFrame' object is not callable
Below is the desired output
Any help would be greatly appreciated.
Upvotes: 1
Views: 6214
Reputation: 1458
The index you're passing as docnames
is empty which is obtained from dataset
as follows:
docnames = ["Doc " + str(i) for i in range(len(dataset))]
So this means that the dataset
is empty too. For a workaround, you can create Doc
indices based on the size of lda_output
as follows:
docnames = ["Doc " + str(i) for i in range(len(lda_output))]
Let me know if this works.
Upvotes: 1