OS95
OS95

Reputation: 35

Inserting rows in df based on groupby using value of previous row

I need to insert rows based on the column week based on the groupby type, in some cases i have missing weeks in the middle of the dataframe at different positions and i want to insert rows to fill in the missing rows as copies of the last existing row, in this case copies of week 7 to fill in the weeks 8 and 9 and copies of week 11 to fill in rows for week 12, 13 and 14 : on this table you can see the jump from week 7 to 10 and from 11 to 15:

enter image description here

the perfect output would be as follow: the final table with incremental values in column week the correct way :

enter image description here

Below is the code i have, it inserts only one row and im confused why:

def middle_values(final : DataFrame) -> DataFrame:
    finaltemp= pd.DataFrame()
    out= pd.DataFrame()
    for i in range(0, len(final)):
        for f in range(1, 52 , 1):
            if final.iat[i,8]==  f and final.iat[i-1,8] != f-1 :
               if final.iat[i,8] > final.iat[i-1,8] and  final.iat[i,8] != (final.iat[i-1,8] - 1):
                    line = final.iloc[i-1]
                    c1 = final[0:i]
                    c2 = final[i:]
                    c1.loc[i]=line
                    concatinated = pd.concat([c1, c2])
                    concatinated.reset_index(inplace=True)
                    concatinated.iat[i,11] = concatinated.iat[i-1,11]
                    concatinated.iat[i,9]= f-1
                    finaltemp = finaltemp.append(concatinated)
    
    if 'type' in finaltemp.columns:
      for name, groups in finaltemp.groupby(["type"]):
        weeks = range(groups['week'].min(), groups['week'].max()+1)
        out = out.append(pd.merge(finaltemp, pd.Series(weeks, name='week'), how='right').ffill())
      out.drop_duplicates(subset=['project', 'week'], keep = 'first', inplace=True)
      out.drop_duplicates(inplace = True)
      out.sort_values(["Budget: Budget Name", "Budget Week"], ascending = (False, True), inplace=True)
        out.drop(['level_0'], axis = 1, inplace=True)
        out.reset_index(inplace=True)
        out.drop(['level_0'], axis = 1, inplace=True)
        return out
    else :
        return final

Upvotes: 0

Views: 325

Answers (1)

Daniel Wyatt
Daniel Wyatt

Reputation: 1151

For the first part of your question. Suppose we have a dataframe like the following:

df = DataFrame({"project":[1,1,1,2,2,2], "week":[1,3,4,1,2,4], "value":[12,22,18,17,18,23]})

We can create a new multi index to get the additional rows that we need

new_index = pd.MultiIndex.from_arrays([sorted([i for i in df['project'].unique()]*52), 
                           [i for i in np.arange(1,53,1)]*df['project'].unique().shape[0]], names=['project', 'week'])

We can then apply this index to get the new dataframe that you need with blanks in the new rows

df = df.set_index(['project', 'week']).reindex(new_index).reset_index().sort_values(['project', 'week'])

You would then need to apply a forward fill (using ffill) or a back fill (using bfill) with groupby and transform to get the required values in the rows that you need.

Upvotes: 1

Related Questions