Ala Głowacka
Ala Głowacka

Reputation: 343

Executing a function that adds columns and populates them dependig on other columns in Pandas

I have got a dataframe that contains a text and result

             Text    Result
0  some text...      True
1  another one...    False

And I have got a function that does a feature extraction from text - returns dict with about 1000 keys that are words and T/F values depending if the word was in a text.

words = ["some", "text", "another", "one", "other", "words"]
def extract(text):
      result = dict()
      for w in words:
             result[w] = (w in text)
      return result

Result I am expecting is

             Text    some   text  another one    other  words  Result
0  some text...      True   True  False   False  False  False  True
1  another one...    False  False True    True   False  False  False

But I don't know how to apply this on a dataframe? What I have done so far is to create columns with default False value, but I have no clue how to populate it with True values.

for feature in words:
    df[feature] = False

I guess that there is better way to do it in pandas?

Upvotes: 3

Views: 59

Answers (2)

piRSquared
piRSquared

Reputation: 294278

Use pd.Series.str.get_dummies with pd.DataFrame.reindex

exp = (
    df.Text.str.get_dummies(' ')
      .reindex(columns=words, fill_value=0)
      .astype(bool)
)

df.drop('Result', 1).join(exp).join(df.Result)

          Text   some   text  another    one  other  words  Result
0    some text   True   True    False  False  False  False    True
1  another one  False  False     True   True  False  False   False

Explanation

get_dummies gives dummy columns for each word found, simple enough. However, I use reindex in order to represent all the words we care about. The fill_value and astype(bool) are there to match OPs output. I use drop and join(df.Result) as a pithy way to get Result to the end of the dataframe.

Upvotes: 2

Max Mikhaylov
Max Mikhaylov

Reputation: 792

You can apply a function to a dataframe column like this:

def func(): # some function that you want to apply to each row in a column
    return None

new_row = df['column_name'].apply(func)

After that you can append the new_row to your existing dataframe.

There's also a similar function but for applying function to entire dataframe.

Edit:

df = pd.DataFrame(['some text...', 'another one...'], columns=['Text'])
words = ["some", "text", "another", "one", "other", "words"]
def extract(text):
      result = dict()
      for w in words:
             result[w] = (w in text)
      return result.values()
new_cols = pd.DataFrame(df['Text'].apply(extract), columns=words)
result_df = pd.concat([df, new_cols], axis=1)

Upvotes: 0

Related Questions