Stan
Stan

Reputation: 884

Creating a new Pandas DataFrame from an Existing One

I have a pandas dataframe which has a month based data as follows:

  df 

   id Month  val
   g1   Jan    1
   g1   Feb    5
   g1   Mar   61

What I want is the following:

I want to convert the dataframe to a week structure with the month column(replaced or not) by all the weeks which can happen in that month, So the output should look like:( thus 4 weeks for each month)

   new_df 

     id  week  val
     g1     1    1
     g1     2    1
     g1     3    1
     g1     4    1
     g1     5    5
     g1     6    5
     g1     7    5
     g1     8    5
     g1     9   61
     g1    10   61
     g1    11   61
     g1    12   61

I have tried using the following function and apply it to the pandas dataframe, but that's not working:

SAMPLE CODE

      def myfun(mon):
        if mon == 'Jan':
           wk = list(range(1,5))
        elif mon == 'Feb':
           wk = list(range(5,9))
        else:
           wk = list(range(9,13))
        return wk

   df['week'] = df.apply(lambda row: myfun(row['Month']), axis=1)
   del df['Month']

The output I am getting is as follows which is not what I want:

       id    val         week
       g1    1     [1, 2, 3, 4]
       g1    5     [5, 6, 7, 8]
       g1    61  [9, 10, 11, 12]

Also is there a neat way to achieve this?

Help will be very much appreciated. Thanks.

Upvotes: 0

Views: 415

Answers (2)

Erfan
Erfan

Reputation: 42916

We can use DataFrame.groupby and Dataframe.reindex with range(4). On the output we use fillna with the method forwardfill ffill to replace the NaN.

After that we convert Month to datetime format with pandas.to_datetime, so we can sort on month.

Finally we create the column Week bij getting the index and adding 1 and drop the Month column:

# extend index with 4 weeks for each month
df_new = pd.concat([
    d.reset_index(drop=True).reindex(range(4))
    for n, d in df.groupby('Month')
], ignore_index=True).fillna(method='ffill')

# Make a datetetime format from month columns
df_new["Month"] = pd.to_datetime(df_new.Month, format='%b', errors='coerce').dt.month

# Now we can sort it by month
df_new.sort_values('Month', inplace=True)

# Create a Week columns
df_new['Week'] = df_new.reset_index(drop=True).index + 1

# Drop month column since we dont need it anymore
df_new.drop('Month', axis=1, inplace=True)
df_new.reset_index(drop=True, inplace=True)

Which yields:

print(df_new)
    id   val  Week
0   g1   1.0     1
1   g1   1.0     2
2   g1   1.0     3
3   g1   1.0     4
4   g1   5.0     5
5   g1   5.0     6
6   g1   5.0     7
7   g1   5.0     8
8   g1  61.0     9
9   g1  61.0    10
10  g1  61.0    11
11  g1  61.0    12

Upvotes: 1

Amany
Amany

Reputation: 321

try this:

month={'Jan':1,'Feb':2,'March':3,'April':4,'May':5,'June':6,'July':7,'August':8,'Sept':9,'Oct':10,'Nov':11,'Dec':12}
new_df = pd.DataFrame(columns=['id', 'week', 'val']) # create a new dataframe
for index,row in df.iterrows(): # for each row in df
    month_num=(month[row[1]]-1)*4+1 # to get the starting week order from the dictionary "month"
    for i in range(4): # iterate four times 
        # append (add) the row with the week value to the new data frame
        new_df = new_df.append({'id':row[0],'week':month_num,'val':row[2]}, ignore_index=True)
        month_num+=1 # increment the week order
print(new_df)

Upvotes: 1

Related Questions