Reputation: 381
I currently have a Pandas Series object where the index name is a term in the document, and the value is how many times the term occurred in the document. An example is shown below:
>>> import pandas as pd
>>> s = pd.Series([1, 4, 1, 2], index=["green", "blue", "red", "yellow"])
>>> print s
green 1
blue 4
red 1
yellow 2
dtype: int64
My goal is to create a list of index names, and each index name is included in the list as many times as its value. The ideal output is shown below:
terms = ["green", "blue", "blue", "blue", "blue", "red", "yellow", "yellow"]
My current code is the following:
termList = list()
termCount = zip(s.index, s.values)
for name, cnt in termCount:
termList += [name]*cnt
I receive the correct output, but I don't believe this method is very pythonic. Can anyone provide advice on how to improve it?
Upvotes: 2
Views: 60
Reputation: 249394
Do it using NumPy, not an explicit loop:
>>> np.repeat(s.index.values, s.values)
array(['green', 'blue', 'blue', 'blue', 'blue', 'red', 'yellow', 'yellow'], dtype=object)
Upvotes: 3