Reputation: 21
This is my first post on stackoverflow:) I've started learning python and pandas library. I have a problem with find text in cell dataframe.
Program:
Import a two file csv (here no problem):
1Dataframe:
Column1 | Column2
546852 | Lorem ipsum dolor sit amet
248597 | Amet luctus venenatis lectus magna fringilla.
842457 | Neque egestas congue quisque egestas.
8465 | Amet luctus venenatis lectus
648 | Neque egestas congue
55 | Lorem ipsum dolor
2Dataframe:
DATA
Lorem
Lectus
Congue
etc.
My question: How find word from 2dataframe (Lorem, Lectus, Congue etc.) in 1Dataframe.columna2 and generate dataframe with 3 columns:
Column1 | Column2 | Column3
546852 | **Lorem** ipsum dolor sit amet | Lorem
248597 | Amet **luctus** venenatis lectus magna fringilla. | Lectus
842457 | Neque egestas **congue** quisque egestas. | Congue
8465 | Amet **luctus** venenatis lectus | Lectus
648 | Neque egestascongue **congue** | Congue
55 | **Lorem** ipsum dolor | Lorem
I've searched Google but I didn't find any solution. Finally, I've dared to write post on stackoverflow:)
Upvotes: 2
Views: 884
Reputation: 30940
Use Series.apply
+ lambda function with list comprehension to the case there is more than one word per cell:
df1['Column3']=df1['Column2'].apply(lambda x: [word for word in df2['DATA'] if word.upper() in x.upper()])
print(df1)
Column1 Column2 Column3
0 546852 Lorem,ipsum,dolor,sit,amet [Lorem]
1 248597 Amet,luctus,venenatis,lectus,magna,fringilla. [Lectus]
2 842457 Neque,egestas,congue,quisque,egestas. [Congue]
3 8465 Amet,luctus,venenatis,lectus [Lectus]
4 648 Neque,egestas,congue [Congue]
5 55 Lorem,ipsum,dolor [Lorem]
Upvotes: 1
Reputation: 13447
If you want to get all possible occurencies you can use the following function. Keep in mind that you should deal with lowercase too.
lst = [l.lower() for l in df2["DATA"].unique().to_list()]
def fun(x):
x = x["Column2"].lower()
return [l.capitalize() for l in lst if l in x]
df1["Column3"] = df1.apply(fun, axis=1)
Upvotes: 1
Reputation: 2569
This a way:
def find_elements(row):
for element in df2.Data.unique():
if row.Column2.str.contains(element):
return element
df3 = df1.copy()
df3["Column3"] = df3.apply(find_elements, axis=1)
That should work, of cource, you can find other way to do it.
Edit : As mentioned by @vb_rises if several word are in the same sentence, the function will only return the first match.
Upvotes: 1