JFerro
JFerro

Reputation: 3433

How to choose python pandas arrangement columns vs rows

I am quite new with pandas (couple of months) and I am starting building up a project that will be based on a pandas data array.

Such pandas data array will consist on a table including different kind of words present in a collection of texts (around 100k docs, and around 200 key-words).

imagine for instance the words "car" and the word "motorbike" and documents numbered doc1, doc2 etc.

how should I go about the arrangement? a) The name of every column is the doc number and the index the words "car" and "motorbike" or b) the other way around; the index being the docs numbers and the columns head the words?

I don't have enough insights of pandas in order to be able to foreseen what will the consequences of such choice. And all the code will be based on that decision.

As a side note there array is not static, there will be more documents and more words being added to the array every now and again.

what would you recommend? a or b? and why?

thanks.

Upvotes: 0

Views: 50

Answers (1)

Vinit Neogi
Vinit Neogi

Reputation: 386

Generally in pandas, we follow a practice that instances are columns (here doc number) and features are columns (here words). So, prefer to use the approach 'b'.

Upvotes: 1

Related Questions