Damian Koźniewski
Damian Koźniewski

Reputation: 21

Python Pandas Dataframe search text in the cells

This is my first post on stackoverflow:) I've started learning python and pandas library. I have a problem with find text in cell dataframe.

Program:

Import a two file csv (here no problem):

1Dataframe:

Column1  | Column2
546852   | Lorem ipsum dolor sit amet
248597   | Amet luctus venenatis lectus magna fringilla.
842457   |  Neque egestas congue quisque egestas.
8465     | Amet luctus venenatis lectus
648      |  Neque egestas congue 
55       | Lorem ipsum dolor 

2Dataframe:

DATA 
Lorem 
Lectus 
Congue
etc.

My question: How find word from 2dataframe (Lorem, Lectus, Congue etc.) in 1Dataframe.columna2 and generate dataframe with 3 columns:

Column1  | Column2                                                | Column3 
546852   | **Lorem** ipsum dolor sit amet                         | Lorem 
248597   | Amet **luctus** venenatis lectus magna fringilla.      | Lectus 
842457   |  Neque egestas **congue** quisque egestas.             | Congue 
8465     | Amet **luctus** venenatis lectus                       | Lectus 
648      |  Neque egestascongue  **congue**                       | Congue  
55       | **Lorem** ipsum dolor                                  | Lorem

I've searched Google but I didn't find any solution. Finally, I've dared to write post on stackoverflow:)

Upvotes: 2

Views: 884

Answers (3)

ansev
ansev

Reputation: 30940

Use Series.apply + lambda function with list comprehension to the case there is more than one word per cell:

df1['Column3']=df1['Column2'].apply(lambda x: [word  for word in df2['DATA'] if word.upper() in x.upper()])
print(df1)

   Column1                                        Column2   Column3
0   546852                     Lorem,ipsum,dolor,sit,amet   [Lorem]
1   248597  Amet,luctus,venenatis,lectus,magna,fringilla.  [Lectus]
2   842457          Neque,egestas,congue,quisque,egestas.  [Congue]
3     8465                   Amet,luctus,venenatis,lectus  [Lectus]
4      648                           Neque,egestas,congue  [Congue]
5       55                              Lorem,ipsum,dolor   [Lorem]

Upvotes: 1

rpanai
rpanai

Reputation: 13447

If you want to get all possible occurencies you can use the following function. Keep in mind that you should deal with lowercase too.

lst = [l.lower() for l in df2["DATA"].unique().to_list()]

def fun(x):
    x = x["Column2"].lower()
    return [l.capitalize() for l in lst if l in x]

df1["Column3"] = df1.apply(fun, axis=1)

Upvotes: 1

Florian Bernard
Florian Bernard

Reputation: 2569

This a way:


def find_elements(row):
    for element in df2.Data.unique():
        if row.Column2.str.contains(element):
            return element

df3 = df1.copy()
df3["Column3"] = df3.apply(find_elements, axis=1)

That should work, of cource, you can find other way to do it.

Edit : As mentioned by @vb_rises if several word are in the same sentence, the function will only return the first match.

Upvotes: 1

Related Questions