How to choose python pandas arrangement columns vs rows

Question

I am quite new with pandas (couple of months) and I am starting building up a project that will be based on a pandas data array.

Such pandas data array will consist on a table including different kind of words present in a collection of texts (around 100k docs, and around 200 key-words).

imagine for instance the words "car" and the word "motorbike" and documents numbered doc1, doc2 etc.

how should I go about the arrangement? a) The name of every column is the doc number and the index the words "car" and "motorbike" or b) the other way around; the index being the docs numbers and the columns head the words?

I don't have enough insights of pandas in order to be able to foreseen what will the consequences of such choice. And all the code will be based on that decision.

As a side note there array is not static, there will be more documents and more words being added to the array every now and again.

what would you recommend? a or b? and why?

thanks.

Vinit Neogi · Accepted Answer

Generally in pandas, we follow a practice that instances are columns (here doc number) and features are columns (here words). So, prefer to use the approach 'b'.

How to choose python pandas arrangement columns vs rows

Answers (1)

Related Questions