Reputation: 307
I have a dataframe as below and I want to find how many times value from the column Jan
occurs in the column URL
and corresponding cell of the column URL
.
I want to create 3 columns - found in cell
and found in column
and distinct finds
For example when we search for value try
from the first cell of the column Jan
, it should return 1 in found in cell
and 2 in 'found in columnand 2 in
distinct findsbecause the word was found in 2 rows
when we search for value
whyfrom the second cell of the column
Jan, it should return 0 in
found in celland 2 in 'found in column
and 2 in distinct finds
because the word was found in 2 rows
i know how to search within a string. But how could i search within a cell and within a column?
s="ea2017-104.pdf bb cc for why"
s.lower().count("why")#to find text within string
sales = [{'account': '3', 'Jan': 'try', 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
{'account': '1', 'Jan': 'why', 'Feb': '210', 'URL': 'try '},
{'account': '2', 'Jan': 'bbbbb', 'Feb': '90', 'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df
df['column_find']=df['URL'].str.lower().count('why')
final output will have 3 additional columns as below
found_inCell found_in_column distinct_finds
2 3 2
0 2 2
0 1 1
I get an error when i try to run code when one of the cells in empty/np.nan
sales = [{'account': '3', 'Jan': np.nan, 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
{'account': '1', 'Jan': 'try', 'Feb': '210', 'URL': 'try '},
{'account': '2', 'Jan': 'bbbbb', 'Feb': '90', 'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df
df['found_inCell'] = df.apply(lambda row: row['URL'].count(row['Jan']), axis=1)
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))
Upvotes: 0
Views: 2038
Reputation: 164843
Here is one way.
df['found_inCell'] = df.apply(lambda row: row['URL'].count(row['Jan']), axis=1)
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))
# Feb Jan URL account found_inCell \
# 0 200 .jones try ea2018-001.pdf try bbbbb why 3 1
# 1 210 why try 1 0
# 2 90 bbbbb ea2017-104.pdf bb cc for why 2 0
# found_in_column distinct_finds
# 0 2 2
# 1 2 2
# 2 1 1
Upvotes: 2