Reputation: 13
I have a Pandas df that has this structure:
Store CID UnitsOH Count
1 23095 17_17_17_16_16_15_15_15_15_15_13_12_10_9_8_7_7... 15982
23101 6_6_5_5_5_5_4_3_3_3_7_6_5_5_5_5_5_5_3_2_2_5_5_... 15982
23117 6_6_6_6_6_6_6_6_6_6_6_6_5_5_5_4_3_3_3_3_3_3_3_... 15982
23161 6_6_6_6_6_6_6_6_6_6_6_5_5_5_4_4_4_4_4_3_3_3_3_... 15982
23222 5_5_5_5_5_5_5_5_4_4_4_4_3_3_3_3_3_3_3_3_3_3_7_... 15982
I need to count how many times a specific Pattern happens on that "Units OH" column. For example, need to count how many times every row has any positive number followed by 0. I used a "_" separator when I concatenated the field, so I'm looking for a Pattern of '[1-9][0]__' (Sorry about the format... first post here and don't understand how to format the text correctly).
I used this code to create that last column called 'Count':
ConcatOH['Count'] = ConcatOH['Units_OH'].str.count('_[1-9]_[0]_').sum()
However, as you can see, it seems that the the count is counting through the entire dataframe and giving me the same count for every row. How can I do the count by row only. is there an axis=0 argument I could use somewhere or can somebody help me with how to use the apply method to this?
Upvotes: 1
Views: 192
Reputation: 7224
Javier, do you mean something like this:
import re
ConcatOH['Units_OH'].apply(lambda x: len(re.findall('_[\d+]_0', x)))
Upvotes: 0
Reputation: 14104
Remove the .sum()
at the end of ConcatOH['Units_OH'].str.count('_[1-9]_[0]_').sum()
ConcatOH['Units_OH'].str.count('_[1-9]_[0]_')
returns a series and then your summing it to get an int and that is assigned to ConcatOH['Count']
hence why you have the same value for each row
Your basically doing
ConcatOH['Count'] = 15982
You want
ConcatOH['Count'] = ConcatOH['Units_OH'].str.count('_[1-9]_[0]_')
Upvotes: 1