Ni_Tempe
Ni_Tempe

Reputation: 307

finding a string in pandas dataframe column and cell

I have a dataframe as below and I want to find how many times value from the column Jan occurs in the column URL and corresponding cell of the column URL.

I want to create 3 columns - found in cell and found in column and distinct finds For example when we search for value try from the first cell of the column Jan, it should return 1 in found in cell and 2 in 'found in columnand 2 indistinct findsbecause the word was found in 2 rows when we search for valuewhyfrom the second cell of the columnJan, it should return 0 infound in celland 2 in 'found in column and 2 in distinct finds because the word was found in 2 rows

i know how to search within a string. But how could i search within a cell and within a column?

s="ea2017-104.pdf bb cc for why"
s.lower().count("why")#to find text within string

sales = [{'account': '3', 'Jan': 'try', 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
             {'account': '1',  'Jan': 'why', 'Feb': '210', 'URL': 'try '},
             {'account': '2',  'Jan': 'bbbbb',  'Feb': '90',  'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df

df['column_find']=df['URL'].str.lower().count('why')

final output will have 3 additional columns as below

found_inCell    found_in_column           distinct_finds
2                3                   2
0                2                   2
0                1                   1

update

I get an error when i try to run code when one of the cells in empty/np.nan

sales = [{'account': '3', 'Jan': np.nan, 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
             {'account': '1',  'Jan': 'try', 'Feb': '210', 'URL': 'try '},
             {'account': '2',  'Jan': 'bbbbb',  'Feb': '90',  'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df

df['found_inCell'] = df.apply(lambda row: row['URL'].count(row['Jan']), axis=1)
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))

Upvotes: 0

Views: 2038

Answers (1)

jpp
jpp

Reputation: 164843

Here is one way.

df['found_inCell'] = df.apply(lambda row: row['URL'].count(row['Jan']), axis=1)
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))

#           Feb    Jan                           URL account  found_inCell  \
# 0  200 .jones    try  ea2018-001.pdf try bbbbb why       3             1   
# 1         210    why                          try        1             0   
# 2          90  bbbbb  ea2017-104.pdf bb cc for why       2             0   

#    found_in_column  distinct_finds  
# 0                2               2  
# 1                2               2  
# 2                1               1  

Upvotes: 2

Related Questions