Reputation: 884
I have a pandas dataframe which has a month based data as follows:
df
id Month val
g1 Jan 1
g1 Feb 5
g1 Mar 61
What I want is the following:
I want to convert the dataframe to a week structure with the month column(replaced or not) by all the weeks which can happen in that month, So the output should look like:( thus 4 weeks for each month)
new_df
id week val
g1 1 1
g1 2 1
g1 3 1
g1 4 1
g1 5 5
g1 6 5
g1 7 5
g1 8 5
g1 9 61
g1 10 61
g1 11 61
g1 12 61
I have tried using the following function and apply it to the pandas dataframe, but that's not working:
SAMPLE CODE
def myfun(mon):
if mon == 'Jan':
wk = list(range(1,5))
elif mon == 'Feb':
wk = list(range(5,9))
else:
wk = list(range(9,13))
return wk
df['week'] = df.apply(lambda row: myfun(row['Month']), axis=1)
del df['Month']
The output I am getting is as follows which is not what I want:
id val week
g1 1 [1, 2, 3, 4]
g1 5 [5, 6, 7, 8]
g1 61 [9, 10, 11, 12]
Also is there a neat way to achieve this?
Help will be very much appreciated. Thanks.
Upvotes: 0
Views: 415
Reputation: 42916
We can use DataFrame.groupby
and Dataframe.reindex
with range(4)
. On the output we use fillna
with the method forwardfill ffill
to replace the NaN
.
After that we convert Month
to datetime format with pandas.to_datetime
, so we can sort on month.
Finally we create the column Week
bij getting the index and adding 1 and drop the Month
column:
# extend index with 4 weeks for each month
df_new = pd.concat([
d.reset_index(drop=True).reindex(range(4))
for n, d in df.groupby('Month')
], ignore_index=True).fillna(method='ffill')
# Make a datetetime format from month columns
df_new["Month"] = pd.to_datetime(df_new.Month, format='%b', errors='coerce').dt.month
# Now we can sort it by month
df_new.sort_values('Month', inplace=True)
# Create a Week columns
df_new['Week'] = df_new.reset_index(drop=True).index + 1
# Drop month column since we dont need it anymore
df_new.drop('Month', axis=1, inplace=True)
df_new.reset_index(drop=True, inplace=True)
Which yields:
print(df_new)
id val Week
0 g1 1.0 1
1 g1 1.0 2
2 g1 1.0 3
3 g1 1.0 4
4 g1 5.0 5
5 g1 5.0 6
6 g1 5.0 7
7 g1 5.0 8
8 g1 61.0 9
9 g1 61.0 10
10 g1 61.0 11
11 g1 61.0 12
Upvotes: 1
Reputation: 321
try this:
month={'Jan':1,'Feb':2,'March':3,'April':4,'May':5,'June':6,'July':7,'August':8,'Sept':9,'Oct':10,'Nov':11,'Dec':12}
new_df = pd.DataFrame(columns=['id', 'week', 'val']) # create a new dataframe
for index,row in df.iterrows(): # for each row in df
month_num=(month[row[1]]-1)*4+1 # to get the starting week order from the dictionary "month"
for i in range(4): # iterate four times
# append (add) the row with the week value to the new data frame
new_df = new_df.append({'id':row[0],'week':month_num,'val':row[2]}, ignore_index=True)
month_num+=1 # increment the week order
print(new_df)
Upvotes: 1