Reputation: 343
I have got a dataframe that contains a text and result
Text Result
0 some text... True
1 another one... False
And I have got a function that does a feature extraction from text - returns dict with about 1000 keys that are words and T/F values depending if the word was in a text.
words = ["some", "text", "another", "one", "other", "words"]
def extract(text):
result = dict()
for w in words:
result[w] = (w in text)
return result
Result I am expecting is
Text some text another one other words Result
0 some text... True True False False False False True
1 another one... False False True True False False False
But I don't know how to apply this on a dataframe? What I have done so far is to create columns with default False value, but I have no clue how to populate it with True values.
for feature in words:
df[feature] = False
I guess that there is better way to do it in pandas?
Upvotes: 3
Views: 59
Reputation: 294278
Use pd.Series.str.get_dummies
with pd.DataFrame.reindex
exp = (
df.Text.str.get_dummies(' ')
.reindex(columns=words, fill_value=0)
.astype(bool)
)
df.drop('Result', 1).join(exp).join(df.Result)
Text some text another one other words Result
0 some text True True False False False False True
1 another one False False True True False False False
Explanation
get_dummies
gives dummy columns for each word found, simple enough. However, I use reindex in order to represent all the words we care about. The fill_value
and astype(bool)
are there to match OPs output. I use drop
and join(df.Result)
as a pithy way to get Result
to the end of the dataframe.
Upvotes: 2
Reputation: 792
You can apply a function to a dataframe column like this:
def func(): # some function that you want to apply to each row in a column
return None
new_row = df['column_name'].apply(func)
After that you can append the new_row
to your existing dataframe.
There's also a similar function but for applying function to entire dataframe.
Edit:
df = pd.DataFrame(['some text...', 'another one...'], columns=['Text'])
words = ["some", "text", "another", "one", "other", "words"]
def extract(text):
result = dict()
for w in words:
result[w] = (w in text)
return result.values()
new_cols = pd.DataFrame(df['Text'].apply(extract), columns=words)
result_df = pd.concat([df, new_cols], axis=1)
Upvotes: 0