Reputation: 2286
The 'Name' column contains a persons name in one row, followed by the numbered task they need to perform (with a small description), all tasks are associated with that persons name until another persons name appears (So Tom has all the tasks below his name associated with Tom until Jim appears, then Jim is associated with all the tasks following his name, until the next name...and so on). So here is a sample of the data I have:
Name Three Digit Task Number of Days
Tom BLANK 0.00
1.1.6.1 Task Description 1.1.6 9.00
1.1.6.2 Task Description 1.1.6 8.25
1.1.1.4 Task Description 1.1.1 13.25
Jim BLANK 0.00
1.1.3.1 Task Description 1.1.3 8.75
1.2.1.1 Task Description 1.2.1 6.00
1.2.1.2 Task Description 1.2.1 12.75
So I would like to get a sum of the number of days for each person, grouped by the Three Digit Task. Would like it to look something like this:
Tom 1.1.1 13.25
Tom 1.1.6 17.25
Jim 1.1.3 8.75
Jim 1.2.1 18.75
So I have tried using:
import string
ALPHA = string.ascii_letters
df['Name'].str.startswith(tuple(ALPHA))
This returns a true/false if the 'Name' column starts with a letter or not (True for a letter, false otherwise). Was trying to say something like: in between the true (which would be peoples names) sum the 'Number of Days' by grouping on the 'Three Digit Task'
Upvotes: 2
Views: 602
Reputation: 294258
tl; dr
name_bool = df.Name.str.match('^[a-zA-Z]')
grp_keys = name_bool.cumsum()
grps = df.groupby(grp_keys)
tdt = 'Three Digit Task'
nod = 'Number of Days'
funcs = {'Name': 'first', nod: 'sum'}
dicts = {g.iloc[0, 0]: g.tail(-1).groupby(tdt).agg(funcs) for _, g in grps}
pd.concat(dicts)
explanation
use regex
to find which rows have a Name
column that starts with a letter.
name_bool = df.Name.str.match('^[a-zA-Z]')
name_bool
0 True
1 False
2 False
3 False
4 True
5 False
6 False
7 False
Name: Name, dtype: bool
use cumsum
to create a unique number for each contigous set of rows following a Name
grp_keys = name_bool.cumsum()
grp_keys
0 1
1 1
2 1
3 1
4 2
5 2
6 2
7 2
Name: Name, dtype: int64
create pandas groupby
object
grps = df.groupby(grp_keys)
use agg
and pd.concat
to created final pd.DataFrame
funcs = {'Name': 'first', nod: 'sum'}
dicts = {g.iloc[0, 0]: g.tail(-1).groupby(tdt).agg(funcs) for _, g in grps}
pd.concat(dicts)
Upvotes: 1