Creating a new Pandas DataFrame from an Existing One

Question

I have a pandas dataframe which has a month based data as follows:

  df 

   id Month  val
   g1   Jan    1
   g1   Feb    5
   g1   Mar   61

What I want is the following:

I want to convert the dataframe to a week structure with the month column(replaced or not) by all the weeks which can happen in that month, So the output should look like:( thus 4 weeks for each month)

   new_df 

     id  week  val
     g1     1    1
     g1     2    1
     g1     3    1
     g1     4    1
     g1     5    5
     g1     6    5
     g1     7    5
     g1     8    5
     g1     9   61
     g1    10   61
     g1    11   61
     g1    12   61

I have tried using the following function and apply it to the pandas dataframe, but that's not working:

SAMPLE CODE

      def myfun(mon):
        if mon == 'Jan':
           wk = list(range(1,5))
        elif mon == 'Feb':
           wk = list(range(5,9))
        else:
           wk = list(range(9,13))
        return wk

   df['week'] = df.apply(lambda row: myfun(row['Month']), axis=1)
   del df['Month']

The output I am getting is as follows which is not what I want:

       id    val         week
       g1    1     [1, 2, 3, 4]
       g1    5     [5, 6, 7, 8]
       g1    61  [9, 10, 11, 12]

Also is there a neat way to achieve this?

Help will be very much appreciated. Thanks.

Erfan · Accepted Answer

We can use DataFrame.groupby and Dataframe.reindex with range(4). On the output we use fillna with the method forwardfill ffill to replace the NaN.

After that we convert Month to datetime format with pandas.to_datetime, so we can sort on month.

Finally we create the column Week bij getting the index and adding 1 and drop the Month column:

# extend index with 4 weeks for each month
df_new = pd.concat([
    d.reset_index(drop=True).reindex(range(4))
    for n, d in df.groupby('Month')
], ignore_index=True).fillna(method='ffill')

# Make a datetetime format from month columns
df_new["Month"] = pd.to_datetime(df_new.Month, format='%b', errors='coerce').dt.month

# Now we can sort it by month
df_new.sort_values('Month', inplace=True)

# Create a Week columns
df_new['Week'] = df_new.reset_index(drop=True).index + 1

# Drop month column since we dont need it anymore
df_new.drop('Month', axis=1, inplace=True)
df_new.reset_index(drop=True, inplace=True)

Which yields:

print(df_new)
    id   val  Week
0   g1   1.0     1
1   g1   1.0     2
2   g1   1.0     3
3   g1   1.0     4
4   g1   5.0     5
5   g1   5.0     6
6   g1   5.0     7
7   g1   5.0     8
8   g1  61.0     9
9   g1  61.0    10
10  g1  61.0    11
11  g1  61.0    12

Creating a new Pandas DataFrame from an Existing One

Answers (2)

Related Questions