Reputation: 968
I have a dataframe:
df
Name Date ID Amount
0 Faye 2018-12-31 A 2
1 Faye 2019-03-31 A 1
2 Faye 2019-06-30 A 5
3 Faye 2019-09-30 B 2
4 Faye 2019-09-30 C 2
5 Faye 2019-12-31 A 4
6 Faye 2020-03-01 A 1
7 Faye 2020-03-01 B 1
8 Mike 2018-12-31 A 4
9 Mike 2019-03-31 A 4
10 Mike 2019-06-30 B 3
And for each Name
, Date
, ID
, group I calculate the % change of Amount
from the previous Date
in a new column. If there was no previous entry, then I add New
:
df['% Change'] = (df.sort_values('Date').groupby(['Name', 'ID']).Amount.pct_change())
df['% Change'] = df['% Change'].fillna('New')
But I also want to have an entry for the opposite case, where a Name
, Date
, ID
, group did exist previously, but doesn't in the next date; so that the output looks like:
Name Date ID Amount % Change
0 Faye 2018-12-31 A 2 New
1 Faye 2019-03-31 A 1 -0.5
2 Faye 2019-06-30 A 5 4
3 Faye 2019-09-30 A 0 Sold
4 Faye 2019-09-30 B 2 New
5 Faye 2019-09-30 C 2 New
6 Faye 2019-12-31 A 4 New
7 Faye 2020-03-01 A 1 -0.75
8 Faye 2020-03-01 B 1 -0.5
9 Mike 2018-12-31 A 4 New
10 Mike 2019-03-31 A 4 0
11 Mike 2019-06-30 A 0 Sold
12 Mike 2019-06-30 B 3 New
In case it helps, I am trying to emulate how this site handles such cases.
Upvotes: 3
Views: 90
Reputation: 12503
Here's a solution:
# run you original code to ide
df['% Change'] = (df.sort_values('Date').groupby(['Name', 'ID']).Amount.pct_change())
df['% Change'] = df['% Change'].fillna('New')
# Create a dataframe of all te dates.
all_dates = pd.DataFrame({"Date": df["Date"].unique()})
all_dates["one"] = 1
# Create a dasta frame of all the possible recored (all combinations of id-name-date)
name_ids = df[["Name", "ID"]].drop_duplicates()
name_ids["one"] = 1
all_possible_records = pd.merge(all_dates, name_ids, on="one")
all_possible_records = pd.merge(all_possible_records, df, on = ["Date", "Name", "ID"], how ="left")
all_possible_records.drop("one", axis = "columns", inplace = True)
all_possible_records.sort_values(["Name", "ID", "Date"], inplace=True)
# For every record, shift 1 to see if it had any value in the previous querter.
all_possible_records["prev_q"] = all_possible_records.groupby(["Name", "ID"]).shift(1)["Amount"]
# records in which change is NaN - but there was a value in the previous querter - are 'sold'
all_possible_records.loc[all_possible_records["% Change"].isna() & all_possible_records.prev_q.notna(), "% Change"]="Sold"
# Drop redundent records.
res = all_possible_records.dropna(axis="rows", subset=["% Change"])
res
The result is:
Upvotes: 1