Use the apply function in Pandas to use a Regex count per row

Question

I have a Pandas df that has this structure:

Store CID          UnitsOH                                        Count

1   23095   17_17_17_16_16_15_15_15_15_15_13_12_10_9_8_7_7...   15982

23101   6_6_5_5_5_5_4_3_3_3_7_6_5_5_5_5_5_5_3_2_2_5_5_...   15982

23117   6_6_6_6_6_6_6_6_6_6_6_6_5_5_5_4_3_3_3_3_3_3_3_...   15982

23161   6_6_6_6_6_6_6_6_6_6_6_5_5_5_4_4_4_4_4_3_3_3_3_...   15982

23222   5_5_5_5_5_5_5_5_4_4_4_4_3_3_3_3_3_3_3_3_3_3_7_...   15982

I need to count how many times a specific Pattern happens on that "Units OH" column. For example, need to count how many times every row has any positive number followed by 0. I used a "_" separator when I concatenated the field, so I'm looking for a Pattern of '[1-9][0]__' (Sorry about the format... first post here and don't understand how to format the text correctly).

I used this code to create that last column called 'Count':


ConcatOH['Count'] = ConcatOH['Units_OH'].str.count('_[1-9]_[0]_').sum()

However, as you can see, it seems that the the count is counting through the entire dataframe and giving me the same count for every row. How can I do the count by row only. is there an axis=0 argument I could use somewhere or can somebody help me with how to use the apply method to this?

Kenan · Accepted Answer

Remove the .sum() at the end of ConcatOH['Units_OH'].str.count('_[1-9]_[0]_').sum()

ConcatOH['Units_OH'].str.count('_[1-9]_[0]_') returns a series and then your summing it to get an int and that is assigned to ConcatOH['Count'] hence why you have the same value for each row

Your basically doing

ConcatOH['Count'] = 15982

You want

ConcatOH['Count'] = ConcatOH['Units_OH'].str.count('_[1-9]_[0]_')

Use the apply function in Pandas to use a Regex count per row

Answers (2)

Related Questions