How to add new Dataframe Column with Dictionary Key, if the Value is found in a column text string

Question

I have a dataframe in which one column has text information.

print(df):

...   | ... |  Text                         |

...   | ... |  StringA. StringB. StringC    |
...   | ... |  StringZ. StringY. StringX    |
...   | ... |  StringL. StringK. StringJ    |
...   | ... |  StringA. StringZ. StringJ    |

I also have a dictionary that has the following:

dict = {'Dogs': ['StringA', 'StringL'],'Cats': ['StringB', 'StringZ', 'StringJ'],'Birds': ['StringK', 'StringY']}

EDIT: i have about 100 dictionary Keys which each have 4+ Values.

What I am hoping to do is create extra columns in the dataframe for each Key in the dictionary and then place a "1" in the column when any of the Values from the dictionary appear.

Therefore the output i am trying to get is:

print(df):

...   | ... |  Text                         |   Dogs   |   Cats    |   Birds

...   | ... |  StringA. StringB. StringC    |   1      |   1       |   0
...   | ... |  StringZ. StringY. StringX    |   0      |   1       |   1
...   | ... |  StringL. StringK. StringJ    |   1      |   1       |   1
...   | ... |  StringA. StringZ. StringJ    |   1      |   1       |   0

EDIT: The issue is I'm not sure how to search for the values within the text column and then return a 1 if found to the Key column. Any help would be much appreciated! Thanks!

Abhishek J · Accepted Answer

import pandas as pd

d = {'Dogs': ['StringA', 'StringL'],'Cats': ['StringB', 'StringZ', 'StringJ'],'Birds': ['StringK', 'StringY']}
df = pd.DataFrame({'Text': ['StringA. StringB. StringC', 'StringZ. StringY. StringX', 'StringL. StringK. StringJ',
                            'StringA. StringZ. StringJ']})

for k,v in d.items(): # Key, value iteration of dict
    df[k] = df.apply(lambda x: 1 if any([s in x['Text'] for s in v]) else 0, axis=1)

# Apply lambda function to each row in the new column. If any of the values in the array is present in the text, its a 1

# Output
                        Text  Dogs  Cats  Birds
0  StringA. StringB. StringC     1     1      0
1  StringZ. StringY. StringX     0     1      1
2  StringL. StringK. StringJ     1     1      1
3  StringA. StringZ. StringJ     1     1      0

This solution may be unoptimal if the Strings are large or there are many strings. In which case you may have to add an additional column with some sort of Trie data structure.

But the above solution should work for most moderate cases.

How to add new Dataframe Column with Dictionary Key, if the Value is found in a column text string

Answers (2)

Related Questions