Reputation: 307
I have a question based upon my earlier question. Below code runs fine and it tells me whether the search_string
is present in the entire row or not. How could I modify the last line so that it provides me counts of matches instead of 1 or 0? For example, for the first row it should return 4 as my search_string
is present in 4 locations in that row.
sales = [{'account': 'Jones LLC jones', 'Jan': '150', 'Feb': '200', 'Mar': '140 jones jones'},
{'account': 'Alpha Co', 'Jan': 'Jones', 'Feb': '210', 'Mar': '215'},
{'account': 'Blue Inc', 'Jan': '50', 'Feb': '90', 'Mar': '95' }]
df = pd.DataFrame(sales)
df
search_string = 'Jones'
(df.apply(lambda x: x.str.contains(search_string))
.sum(axis=1).astype(int))
Upvotes: 0
Views: 2444
Reputation: 4744
Using the code from the previous question, we simple change the any
method to a sum
method. The adds up all of the 1's to effectively count the number of occurrences in a gives row (axis=1).
## added and extra Jones into row 1 for 'Jan' column
sales = [{'account': 'Jones LLC', 'Jan': 'Jones', 'Feb': '200', 'Mar': '140'},
{'account': 'Alpha Co', 'Jan': 'Jones', 'Feb': '210', 'Mar': '215'},
{'account': 'Blue Inc', 'Jan': '50', 'Feb': '90', 'Mar': '95' }]
df = pd.DataFrame(sales)
df_list = []
for search_string in ['Jones', 'Co', 'Alpha']:
#use above method but rename the series instead of setting to
# a columns. The append to a list.
df_list.append(df.apply(lambda x: x.str.contains(search_string))
.sum(axis=1) ## HERE IS SUM in place of any
.astype(int)
.rename(search_string))
#concatenate the list of series into a DataFrame with the original df
df = pd.concat([df] + df_list, axis=1)
df
Out[2]:
Feb Jan Mar account Jones Co Alpha
0 200 Jones 140 Jones LLC 2 0 0
1 210 Jones 215 Alpha Co 1 1 1
2 90 50 95 Blue Inc 0 0 0
Upvotes: 1
Reputation: 153460
You can use findall
and .str.len
:
sales = [{'account': 'Jones LLC jones', 'Jan': '150', 'Feb': '200', 'Mar': '140 jones jones'},
{'account': 'Alpha Co', 'Jan': 'Jones', 'Feb': '210', 'Mar': '215'},
{'account': 'Blue Inc', 'Jan': '50', 'Feb': '90', 'Mar': '95' }]
df = pd.DataFrame(sales)
df
search_string = 'jones' #Note changed to lowercase j to find more data.
(df.apply(lambda x: x.str.findall(search_string).str.len())
.sum(axis=1).astype(int))
Output:
0 3
1 0
2 0
dtype: int32
Add @Vaishali edit to solution:
df.apply(lambda x: x.str.lower().str.findall(search_string).str.len()).sum(axis=1).astype(int)
Output:
0 4
1 1
2 0
dtype: int32
Upvotes: 2