user53526356
user53526356

Reputation: 968

How to Add Record of Missing Row to Dataframe

I have a dataframe: df

    Name        Date ID  Amount
0   Faye  2018-12-31  A       2
1   Faye  2019-03-31  A       1
2   Faye  2019-06-30  A       5
3   Faye  2019-09-30  B       2
4   Faye  2019-09-30  C       2
5   Faye  2019-12-31  A       4
6   Faye  2020-03-01  A       1
7   Faye  2020-03-01  B       1
8   Mike  2018-12-31  A       4
9   Mike  2019-03-31  A       4
10  Mike  2019-06-30  B       3

And for each Name, Date, ID, group I calculate the % change of Amount from the previous Date in a new column. If there was no previous entry, then I add New:

df['% Change'] = (df.sort_values('Date').groupby(['Name', 'ID']).Amount.pct_change())
df['% Change'] = df['% Change'].fillna('New')

But I also want to have an entry for the opposite case, where a Name, Date, ID, group did exist previously, but doesn't in the next date; so that the output looks like:

    Name        Date ID  Amount % Change
0   Faye  2018-12-31  A       2      New
1   Faye  2019-03-31  A       1     -0.5
2   Faye  2019-06-30  A       5        4
3   Faye  2019-09-30  A       0     Sold
4   Faye  2019-09-30  B       2      New
5   Faye  2019-09-30  C       2      New
6   Faye  2019-12-31  A       4      New
7   Faye  2020-03-01  A       1    -0.75
8   Faye  2020-03-01  B       1     -0.5
9   Mike  2018-12-31  A       4      New
10  Mike  2019-03-31  A       4        0
11  Mike  2019-06-30  A       0     Sold
12  Mike  2019-06-30  B       3      New

In case it helps, I am trying to emulate how this site handles such cases.

Upvotes: 3

Views: 90

Answers (1)

Roy2012
Roy2012

Reputation: 12503

Here's a solution:

# run you original code to ide
df['% Change'] = (df.sort_values('Date').groupby(['Name', 'ID']).Amount.pct_change())
df['% Change'] = df['% Change'].fillna('New')

# Create a dataframe of all te dates. 
all_dates = pd.DataFrame({"Date": df["Date"].unique()}) 
all_dates["one"] = 1

# Create a dasta frame of all the possible recored (all combinations of id-name-date)
name_ids = df[["Name", "ID"]].drop_duplicates()
name_ids["one"] = 1

all_possible_records = pd.merge(all_dates, name_ids, on="one")
all_possible_records = pd.merge(all_possible_records, df, on = ["Date", "Name", "ID"], how ="left")
all_possible_records.drop("one", axis = "columns", inplace = True)
all_possible_records.sort_values(["Name", "ID", "Date"], inplace=True)

# For every record, shift 1 to see if it had any value in the previous querter. 
all_possible_records["prev_q"] = all_possible_records.groupby(["Name", "ID"]).shift(1)["Amount"]

# records in which change is NaN - but there was a value in the previous querter - are 'sold'
all_possible_records.loc[all_possible_records["% Change"].isna() & all_possible_records.prev_q.notna(), "% Change"]="Sold"

# Drop redundent records. 
res = all_possible_records.dropna(axis="rows", subset=["% Change"])
res

The result is:

enter image description here

Upvotes: 1

Related Questions