ValueError: Shape of passed values is, indices imply

Question

Reposting again because i didn't get a response to the first post

I have the following data is below:

desc = pd.DataFrame(description, columns =['new_desc'])

                                             new_desc
257623  the public safety report is compiled from crim...
161135  police say a sea isle city man ordered two pou...
156561  two people are behind bars this morning, after...
41690   pumpkin soup is a beloved breakfast soup in ja...
70092   right now, 15 states are grappling with how be...
...                                                   ...
207258  operation legend results in 59 more arrests, i...
222170                                      see story, 3a
204064  st. louis — missouri secretary of state jason ...
151443  tony lavell jones, 54, of sunset view terrace,...
97367   walgreens, on the other hand, is still going t...

[9863 rows x 1 columns]

I'm trying to find the dominant topic within the documents, and When I run the following code

best_lda_model = lda_desc
data_vectorized = tfidf
lda_output = best_lda_model.transform(data_vectorized)
topicnames = ["Topic " + str(i) for i in range(best_lda_model.n_components)]
docnames = ["Doc " + str(i) for i in range(len(dataset))]
df_document_topic = pd.DataFrame(np.round(lda_output, 2), columns = topicnames, index = docnames)
dominant_topic = np.argmax(df_document_topic.values, axis = 1)
df_document_topic['dominant_topic'] = dominant_topic

I've tried tweaking the code, however, no matter what I change, I get the following error tracebook error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
c:\python36\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
   1673 
-> 1674         mgr = BlockManager(blocks, axes)
   1675         mgr._consolidate_inplace()

c:\python36\lib\site-packages\pandas\core\internals\managers.py in __init__(self, blocks, axes, do_integrity_check)
    148         if do_integrity_check:
--> 149             self._verify_integrity()
    150 

c:\python36\lib\site-packages\pandas\core\internals\managers.py in _verify_integrity(self)
    328             if block.shape[1:] != mgr_shape[1:]:
--> 329                 raise construction_error(tot_items, block.shape[1:], self.axes)
    330         if len(self.items) != tot_items:

ValueError: Shape of passed values is (9863, 8), indices imply (0, 8)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
 in 
      4 topicnames = ["Topic " + str(i) for i in range(best_lda_model.n_components)]
      5 docnames = ["Doc " + str(i) for i in range(len(dataset))]
----> 6 df_document_topic = pd.DataFrame(np.round(lda_output, 2), columns = topicnames, index = docnames)
      7 dominant_topic = np.argmax(df_document_topic.values, axis = 1)
      8 df_document_topic['dominant_topic'] = dominant_topic

c:\python36\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    495                 mgr = init_dict({data.name: data}, index, columns, dtype=dtype)
    496             else:
--> 497                 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
    498 
    499         # For data is list-like, or Iterable (will consume into list)

c:\python36\lib\site-packages\pandas\core\internals\construction.py in init_ndarray(values, index, columns, dtype, copy)
    232         block_values = [values]
    233 
--> 234     return create_block_manager_from_blocks(block_values, [columns, index])
    235 
    236 

c:\python36\lib\site-packages\pandas\core\internals\managers.py in create_block_manager_from_blocks(blocks, axes)
   1679         blocks = [getattr(b, "values", b) for b in blocks]
   1680         tot_items = sum(b.shape[0] for b in blocks)
-> 1681         raise construction_error(tot_items, blocks[0].shape[1:], axes, e)
   1682 
   1683 

ValueError: Shape of passed values is (9863, 8), indices imply (0, 8)

The desired results is to produce a list of documents according to a specific topic. Below is example code and desired output.

df_document_topic(df_document_topic['dominant_topic'] == 2).head(10)

When I run this code, I get the following traceback

TypeError                                 Traceback (most recent call last)
 in 
----> 1 df_document_topic(df_document_topic['dominant_topic'] == 2).head(10)

TypeError: 'DataFrame' object is not callable

Below is the desired output

Any help would be greatly appreciated.

Raghul Raj · Accepted Answer

The index you're passing as docnames is empty which is obtained from dataset as follows:

docnames = ["Doc " + str(i) for i in range(len(dataset))]

So this means that the dataset is empty too. For a workaround, you can create Doc indices based on the size of lda_output as follows:

docnames = ["Doc " + str(i) for i in range(len(lda_output))]

Let me know if this works.

ValueError: Shape of passed values is, indices imply

Answers (1)

Related Questions