Reputation: 1280
I have a column that contains rather lengthy strings. Each of the string may or may not contain substrics. Such substrings as 'H 07', 'H 06' or 'F 13' may or may not appear in a dataframe cell. I would like to count appearances of these substrings and add results to a new cell. The original cell value is
df.iloc[0,0]
'rfgergerggr H 07 jgjg gjgj H 06 gjhgj H 06 '.
The result of the procedure should be a new cell with
df.iloc[0,1]
{'H 07':1, 'H 06':2}
I imagine that this should be done with help of str.contains. But I am looking for about 50 different substrings and I can not imagine a good way to look for them. Also, I think that complex lambda could solve my problems here. But do not know how to built it.
so far I have tried str.contains but it only shows if the substring is there, I do not get the count. Also, to find all 50 substrings I am interested in I will have to call str.contains every time. I think there should be better way of doing that.
Upvotes: 0
Views: 604
Reputation: 9946
something like:
substrs = [...]
def f(cell_value):
return {k: v for k, v in ((s, cell_value.count(s)) for s in substrs) if v}
df.column.apply(f)
Upvotes: 1