Reputation: 93
I am trying to implement new columns into my ML model. A numeric column should be created if a specific word is found in the text of the scraped data. For this I created a dummy script for testing.
import pandas as pd
bagOfWords = ["cool", "place"]
wordsFound = ""
mystring = "This is a cool new place"
mystring = mystring.lower()
for word in bagOfWords:
if word in mystring:
wordsFound = wordsFound + word + " "
print(wordsFound)
pd.get_dummies(wordsFound)
The output is
cool place
0 1
This means there is one sentence "0" and one entry of "cool place". This is not correct. Expectations would be like this:
cool place
0 1 1
Upvotes: 0
Views: 353
Reputation: 93
Found a different solution, as I cound not find any way forward. Its a simple direct hot encoding. For this I enter for every word I need a new column into the dataframe and create the encoding directly.
vocabulary = ["achtung", "suchen"]
for word in vocabulary:
df2[word] = 0
for index, row in df2.iterrows():
if word in row["title"].lower():
df2.set_value(index, word, 1)
Upvotes: 0