Reputation: 3179
I am trying to create a dataframe where the first column ("Value") has a multi-word string in each row and all other columns have labels representing unique words from all strings in "Value". I want to populate this dataframe with the word frequency for every string (a row) checking against all unique words (columns). In a sense, create a simple TDM
rows = ['you want peace', 'we went home', 'our home is nice', 'we want peace at home']
col_list = [word.lower().split(" ") for word in rows]
set_col = set(list(itertools.chain.from_iterable(col_list)))
columns = set_col
ncols = len(set_col)
testDF = pd.DataFrame(columns = set_col)
testDF.insert(0, "Value", " ")
testDF["Value"] = rows
testDF.fillna(0, inplace=True)
irow = 0
for tweet in testDF["Value"]:
for word in tweet.split(" "):
for col in xrange(1, ncols):
if word == testDF.columns[col]: testDF[irow, col] += 1
irow += 1
testDF.head()
However, I am getting an error:
KeyError Traceback (most recent call last)
<ipython-input-64-9a991295ccd9> in <module>()
23 for col in xrange(1, ncols):
24
---> 25 if word == testDF.columns[col]: testDF[irow, col] += 1
26
27 irow += 1
C:\Users\Tony\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
1795 return self._getitem_multilevel(key)
1796 else:
-> 1797 return self._getitem_column(key)
1798
1799 def _getitem_column(self, key):
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3824)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3704)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12280)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12231)()
KeyError: (0, 9)
I am not sure what is wrong thus, will appreciate your help Also, if there is a cleaner way to do this (except NO textmining - problem with installing) it would be great to learn!
Upvotes: 1
Views: 1227
Reputation: 90999
I am not 100% sure what your complete program is trying to do , but if by the following -
testDF[irow, col]
You mean't to index the cell in the dataframe, with irow
as the index and col
as the column, you cannot use simple subscript for that. You should insteand use .iloc
or such. Example -
if word == testDF.columns[col]: testDF.iloc[irow, col] += 1
Use .iloc
if you intended for irow
to the the 0-indexed number of the index , if irow
is the exact index of the DataFrame, you can use .loc
instead of .iloc
.
Upvotes: 2