Inserting rows in df based on groupby using value of previous row

Question

I need to insert rows based on the column week based on the groupby type, in some cases i have missing weeks in the middle of the dataframe at different positions and i want to insert rows to fill in the missing rows as copies of the last existing row, in this case copies of week 7 to fill in the weeks 8 and 9 and copies of week 11 to fill in rows for week 12, 13 and 14 : on this table you can see the jump from week 7 to 10 and from 11 to 15:

the perfect output would be as follow: the final table with incremental values in column week the correct way :

Below is the code i have, it inserts only one row and im confused why:

def middle_values(final : DataFrame) -> DataFrame:
    finaltemp= pd.DataFrame()
    out= pd.DataFrame()
    for i in range(0, len(final)):
        for f in range(1, 52 , 1):
            if final.iat[i,8]==  f and final.iat[i-1,8] != f-1 :
               if final.iat[i,8] > final.iat[i-1,8] and  final.iat[i,8] != (final.iat[i-1,8] - 1):
                    line = final.iloc[i-1]
                    c1 = final[0:i]
                    c2 = final[i:]
                    c1.loc[i]=line
                    concatinated = pd.concat([c1, c2])
                    concatinated.reset_index(inplace=True)
                    concatinated.iat[i,11] = concatinated.iat[i-1,11]
                    concatinated.iat[i,9]= f-1
                    finaltemp = finaltemp.append(concatinated)
    
    if 'type' in finaltemp.columns:
      for name, groups in finaltemp.groupby(["type"]):
        weeks = range(groups['week'].min(), groups['week'].max()+1)
        out = out.append(pd.merge(finaltemp, pd.Series(weeks, name='week'), how='right').ffill())
      out.drop_duplicates(subset=['project', 'week'], keep = 'first', inplace=True)
      out.drop_duplicates(inplace = True)
      out.sort_values(["Budget: Budget Name", "Budget Week"], ascending = (False, True), inplace=True)
        out.drop(['level_0'], axis = 1, inplace=True)
        out.reset_index(inplace=True)
        out.drop(['level_0'], axis = 1, inplace=True)
        return out
    else :
        return final

Daniel Wyatt · Accepted Answer

For the first part of your question. Suppose we have a dataframe like the following:

df = DataFrame({"project":[1,1,1,2,2,2], "week":[1,3,4,1,2,4], "value":[12,22,18,17,18,23]})

We can create a new multi index to get the additional rows that we need

new_index = pd.MultiIndex.from_arrays([sorted([i for i in df['project'].unique()]*52), 
                           [i for i in np.arange(1,53,1)]*df['project'].unique().shape[0]], names=['project', 'week'])

We can then apply this index to get the new dataframe that you need with blanks in the new rows

df = df.set_index(['project', 'week']).reindex(new_index).reset_index().sort_values(['project', 'week'])

You would then need to apply a forward fill (using ffill) or a back fill (using bfill) with groupby and transform to get the required values in the rows that you need.

Inserting rows in df based on groupby using value of previous row

Answers (1)

Related Questions