Converting document-term count in Pandas series into a python list

Question

I currently have a Pandas Series object where the index name is a term in the document, and the value is how many times the term occurred in the document. An example is shown below:

>>> import pandas as pd
>>> s = pd.Series([1, 4, 1, 2], index=["green", "blue", "red", "yellow"])
>>> print s
    green     1
    blue      4
    red       1
    yellow    2
    dtype: int64

My goal is to create a list of index names, and each index name is included in the list as many times as its value. The ideal output is shown below:

terms = ["green", "blue", "blue", "blue", "blue", "red", "yellow", "yellow"]

My current code is the following:

termList = list()
termCount = zip(s.index, s.values)
for name, cnt in termCount:
    termList += [name]*cnt

I receive the correct output, but I don't believe this method is very pythonic. Can anyone provide advice on how to improve it?

John Zwinck · Accepted Answer

Do it using NumPy, not an explicit loop:

>>> np.repeat(s.index.values, s.values)
array(['green', 'blue', 'blue', 'blue', 'blue', 'red', 'yellow', 'yellow'], dtype=object)

Converting document-term count in Pandas series into a python list

Answers (1)

Related Questions