Reputation: 15
I want to create new column in a df that shows two options when executing a function.
I have two lists:
lista = [A, B, C, D]
listb = [would not, use to, manage to, when, did not]
I want to find the first word that can appear from lista
and return it in a new column called "Effect". If this is not found, then search for values from listb
and print the first encountered from listb
along with it next 2 strings.
Example:
I have tried something like this:
def matcher(Description):
for i in lista:
if i in Description:
return i
return "Not found"
def matcher(Description):
for j in listb:
if j in Description:
return j + 1
return "Not found"
df["Effect"] = df.apply(lambda i: matcher(i["Description"]), axis=1)
df["Effect"] = df.apply(lambda j: matcher(j["Description"]), axis=1)
Upvotes: 0
Views: 81
Reputation:
The code below should do what you want to achieve:
def matcher(sentence):
match_list = [substr for substr in lista
if substr in [ word
for word in sentence.replace(',',' ').split(" ")]]
if match_list: # list with items evaluates to True, empty list to False
return match_list[0]
match_list = [substr for substr in listb if ' '+substr+' ' in sentence]
if match_list:
substr = match_list[0]
return substr + " " + sentence.split(substr)[-1].replace(',',' ').strip().split(" ")[0]
return "Not found"
df["Effect"] = df.Description.apply(matcher)
If the sentences come with more than a ',' in them consider to use regular expression replacement instead of .replace(',',' ')
of all non-letter characters in the sentence with a space (so that words stay guaranteed separated) and be aware of the fact that some unusual cases of substrings and sentences can have unexpected side-effects.
UPDATE providing code for adding any number of words after substring matched from listb (requested in the comments) along with explanations how the code works:
lista = ['A', 'B', 'C', 'D']
listb = ["unable to", "would not", "was not", "did not", "there is not", "could not", "failed to", "use to", "manage to", "when"]
# ^-- listb extendend with phrases from another question on same subject
# I want the following, for example, there is the following text:
sentence1 = "During procedure it was noted that A, was present and were notified to deparment."
# In the above text exists A and it will be returned in a new column, only the A value.
sentence2 = "During procedure it was noted that product did not inject as expected."
# In the above text I want to found "did not" and print these text
# along with it next N strings ("did not inject" for N=1 and "did not inject as" for N-2
def matcher(sentence, no_words=1):
# First find a match from lista:
match_list = [substr for substr in lista
if substr in [ word
for word in sentence.replace(',',' ').split(" ")]]
if match_list: # list with items evaluates to True, empty list to False
return match_list[0] # if match found in lista exit function with return
# There was no match from lista so find a match from listb:
match_list = [substr for substr in listb if ' '+substr+' ' in sentence]
if match_list:
substr = match_list[0]
# The code for returning the substr along with additional words from the sentence
# splits the sentence on substr 'sentence.split(substr)' and gets the sentence text
# after the substring by taking the end element of the list created by splitting
# using the list index [-1] ( [1] would do it too ): sentence.split(substr)[-1].
# .replace(',',' ') handles the case of words separated by ',' instead of ' '.
# .strip() handles the case of whitespaces at start and end of the part of
# extracted sentence.
# .split(" ") creates a list of words after substr in the sentence and the slice
# [0:no_words] takes 'no_words' amount of words from this list to join the words
# to one string using ' '.join() in order to add it to substr:
return substr + " " + ' '.join(sentence.split(substr)[-1].replace(',',' ').strip().split(" ")[0:no_words])
# There was no match from lista and list b (no value was yet returned) so:
return "Not found"
print(matcher(sentence1))
print(matcher(sentence2)) # no_words=1 is default
print(matcher(sentence2, 2))
The code above outputs:
A
did not inject
did not inject as
Upvotes: 1
Reputation: 54698
You can do both at once:
def matcher(Description):
w = [i for i in lista if i in Description]
w.extend( [i for i in listb if i in Description] )
if not w:
return "Not found"
else:
return ' '.join(w)
df["Effect"] = df.apply(lambda i: matcher(i["Description"]), axis=1)
Upvotes: 0