How to count some rows within a record and make a new column with the total count?

Question

I got a dataframe like below. I want to make a new column with the total steps count. I got a table like below. You can see that ID 1 has 5 steps.

+----+--------------------------------------------------------+
| ID |                         Steps                          |
+----+--------------------------------------------------------+
|  1 | Another step
A step |
|    | Another step
A step |
|    | Another step
A step |
|    | Another step
A step |
|    | Another step
A step |
|  2 | Another step                         |
|    | Something                            |
|    | Something                            |
|    | Something                            |
|    | Something                            |
+----+--------------------------------------------------------+

I want to use the ‘DIV’ to count the total amount of steps by the right ID and make a new column with the total count of the steps.

+----+--------------------------------------------------------+-------------+
| ID |                         Steps                          | Total_Steps |
+----+--------------------------------------------------------+-------------+
|  1 | Another step
A step |          10 |
|    | Another step
A step |             |
|    | Another step
A step |             |
|    | Another step
A step |             |
|    | Another step
A step |             |
|  2 | Another step                         |           5 |
|    | Something                            |             |
|    | Something                            |             |
|    | Something                            |             |
|    | Something                            |             |
|  3 | Just a step                          |           4 |
|    | Just a step                          |             |
|    | Just a step                          |             |
|    | Just a step                          |             |
+----+--------------------------------------------------------+-------------+

jezrael · Accepted Answer

Use Series.str.count with GroupBy.transform and sum:

df['Total_Steps'] = df['Steps'].str.count('').groupby(df['ID'].ffill()).transform('sum')
print (df)
   ID                                              Steps  Total_Steps
0   1  Another step
A step
Another step
A step
Another step
A step
Another step
A step
Another step
A step
Another step            5
6   2                        Something            5
7   2                        Something            5
8   2                        Something            5
9   2                        Something            5

If need only first values add numpy.where with Series.duplicated:

s = df['Steps'].str.count('').groupby(df['ID'].ffill()).transform('sum')
df['Total_Steps'] = np.where(df['ID'].duplicated(), np.nan, s)
#possible mixed values - numeric with empty strings, but then some function should failed
#df['Total_Steps'] = np.where(df['ID'].duplicated(), '', s)
print (df)
   ID                                              Steps  Total_Steps
0   1  Another step
A step
Another step
A step
Another step
A step
Another step
A step
Another step
A step
Another step          5.0
6   2                        Something          NaN
7   2                        Something          NaN
8   2                        Something          NaN
9   2                        Something          NaN

How to count some rows within a record and make a new column with the total count?

Answers (2)

Related Questions