Reputation: 116
I have two dataframes:
df = list of strings:
Strings |
---|
text xxx text |
text yyy text |
text www text |
text eee text |
text xxx yyyy text |
tag_matrix_df = matrix of columns, labels and search words
Columns | Search | Labels |
---|---|---|
Col1 | xxx | label1 |
Col1 | yyy | label2 |
Col2 | www | label3 |
Col2 | eee | label3 |
Col2 | xxx yyyy | label4 |
I need to add columns to df based on tag_matrix_df when search words are found within the strings from df, for example: df =
Words | Col1 | Col2 |
---|---|---|
text xxx text | label1 | |
text yyy text | label2 | |
text www text | label3 | |
text eee text | label3 | |
text xxx yyyy text | label4 |
I have the following code, which seems to be working ok but it's very slow (tag_matrix_df > 5000 rows).
#df = Dataframe of words
#tag_matrix_df = datafreame of columns, labels and search strings
....
def assign_label(kw, matrix_df):
kw = kw.lower().strip()
for ii, tag in matrix_df.iterrows():
find_tag = tag['Find'].lower().strip()
if kw != None and find_tag in kw.split():
return tag['Label']
flag_cols = tag_matrix_df['Flag Name'].unique()
for flag in flag_cols:
filtered_matrix_df = tag_matrix_df.loc[tag_matrix_df['Flag Name'] == flag]
df[flag] = df.apply(lambda row: assign_label(row[0], filtered_matrix_df), axis=1)
Please also note the issue with the first and last row for tag_matrix_df where we have "xxx" within "xxx yyyy". In this instance, it should match to xxx yyyy as this is the word.
Any suggestions on a smarter implementation of this?
Upvotes: 0
Views: 54
Reputation: 120479
Use pd.merge
:
df = pd.merge(df, tag_matrix_df, left_on='Words', right_on='Search', how='left')
df = df.pivot(index='Words', columns='Columns', values='Labels').fillna('')
>>> df
Columns Col1 Col2
Words
eee label3
www label3
xxx label1
xxx yyyy label4
yyy label2
Upvotes: 0