Reputation: 57
So I have a dataframe containing a time series, like:
NAME, DATE, VACATION (True/False)
Eric, 1/1/12, False
Eric, 1/2/12, True
...
Bob, 4/2/12, True
Bob, 4/3/12, False
Basically what I need out is a dict or something I can template in Jinja2 with the following format
{'eric':
vacations: [1/2/12, ... ],
subtotals: {
'2012': {
'total': 1
'perweek': [1, 0, 0, ... ] (LEN = 52, week numbers)
'perquarter': [1, 0, 0, 0] (LEN = 4)
},
'2013': { ... }
},
'bob': ...
}
Basically I need to get subtotals of vacations per user per year broken down into total per year, per week, per quarter.....
Is there a quick way to do that and convert it into a dict so I could use Jinja2 to template it out?
I know I can do groupby, etc.... but I could only figure out how to do per week separated from per quarter, per year, all as different groupbys and then re-assembly them into a dict.
Is there a way to do all of that at once?
Currently I am doing the following...
perweek = df.groupby(['name', 'date'].count.resample('W').count
for i in perweek.iter
add to dict
perquarter = df.groupby(['name', 'date']).count.resample.count
for i in perquarter.iter
add to dict
peryear ... etc
Basically I group, then count, then resample on.. then iterate over those and add them to the dict. Do the same thing for each of the resamples I want. It works... I was just wondering if there was a way to do it without multiple iterators.
Upvotes: 0
Views: 548
Reputation: 1392
I have a complex solution.
def get_vacation_info(x):
vacations = sorted(x["DATE"][x["VACATION"]=="True"].tolist())
df_x = pd.DataFrame(vacations, columns=["DATE"]).apply(pd.to_datetime)
df_x["year"] = df_x.DATE.dt.year
df_x["week"] = df_x.DATE.dt.weekofyear
df_x["quarter"] = df_x.DATE.dt.quarter
def get_subtotals(y):
total = len(y)
perweek = y["week"].value_counts().reset_index()
weekindex = pd.DataFrame([i for i in range(1, 53)], columns=["index"])
merged_week = pd.merge(weekindex, perweek, on=["index"], how="left").fillna(0).astype(int)
perweek = merged_week["week"].tolist()
perquarter = y["quarter"].value_counts().reset_index()
quarterindex = pd.DataFrame([i for i in range(1, 5)], columns=["index"])
merged_quater = pd.merge(quarterindex, perquarter, on=["index"], how="left").fillna(0).astype(int)
perquarter = merged_quater["quarter"].tolist()
return {"total": total, "perweek": perweek, "perquarter": perquarter}
subtotals = df_x.groupby("year").apply(lambda y: get_subtotals(y)).to_json()
return {"vacations": vacations, "subtotals": subtotals}
result = df.groupby("NAME").apply(lambda x: get_vacation_info(x))
Upvotes: 1