Reputation: 271
I want to repaet the rows in my df in a time sequence with forward filling.
Original df:
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 XYZ 1 B 1954
3 123 1 C 1954
4 X12 1 B 1956
5 123 1 D 1956
6 124 1 D 1956
Desired df:
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 ABC 0 A 1951
3 CDE 1 A 1951
4 ABC 0 A 1952
5 CDE 1 A 1952
6 ABC 0 A 1953
7 CDE 1 A 1953
8 XYZ 1 B 1954
9 123 1 C 1954
10 XYZ 1 B 1955
11 123 1 C 1955
12 X12 1 B 1956
13 123 1 D 1956
14 124 1 D 1956
I have tried converting the Year column to datetime and used a resampling yearwise with forward fill. But that didn't work as resample gives only one row for each year if resample year wise.
df.resample('YS').first().ffill().reset_index()
Desired df:
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 ABC 0 A 1951
3 CDE 1 A 1951
4 ABC 0 A 1952
5 CDE 1 A 1952
6 ABC 0 A 1953
7 CDE 1 A 1953
8 XYZ 1 B 1954
9 123 1 C 1954
10 XYZ 1 B 1955
11 123 1 C 1955
12 X12 1 B 1956
13 123 1 D 1956
14 124 1 D 1956
Upvotes: 2
Views: 1013
Reputation: 271
I took a different approach by pivoting & melting.. Seems to be working.. Any body sees an issue..?
data = {'year': ['2000', '2000', '2005', '2005', '2007', '2007', '2007', '2009'],
'country':['UK', 'US', 'FR','US','UK','FR','US','UK'],
'sales': [10, 21, 20, 10,12,20, 10,12],
'rep': ['john', 'john', 'claire','claire', 'kyle','kyle','kyle','amy']
}
df=pd.DataFrame(data)
year country sales rep
0 2000 UK 10 john
1 2000 US 21 john
2 2005 FR 20 claire
3 2005 US 10 claire
4 2007 UK 12 kyle
5 2007 FR 20 kyle
6 2007 US 10 kyle
7 2009 UK 12 amy
First doing a pivot...
dfp=pd.pivot_table(df,index=['country','rep'],values=['sales'],columns=['year']).fillna(0)
dfp=dfp.xs('sales', axis=1, drop_level=True)
year 2000 2005 2007 2009
country rep
FR claire 0.0 20.0 0.0 0.0
kyle 0.0 0.0 20.0 0.0
UK amy 0.0 0.0 0.0 12.0
john 10.0 0.0 0.0 0.0
kyle 0.0 0.0 12.0 0.0
US claire 0.0 10.0 0.0 0.0
john 21.0 0.0 0.0 0.0
kyle 0.0 0.0 10.0 0.0
Then a little logic to replicate the columns..
cols=dfp.columns.astype(int).values
dft=dfp.copy()
i=0
for col in cols :
if col != cols[-1]:
for newcol in range(col+1,cols[i+1]):
dft[str(newcol)]=dft[str(col)]
i+=1
year 2000 2005 2007 2009 2001 2002 2003 2004 2006 2008
country rep
FR claire 0.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0 20.0 0.0
kyle 0.0 0.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0 20.0
UK amy 0.0 0.0 0.0 12.0 0.0 0.0 0.0 0.0 0.0 0.0
john 10.0 0.0 0.0 0.0 10.0 10.0 10.0 10.0 0.0 0.0
kyle 0.0 0.0 12.0 0.0 0.0 0.0 0.0 0.0 0.0 12.0
US claire 0.0 10.0 0.0 0.0 0.0 0.0 0.0 0.0 10.0 0.0
john 21.0 0.0 0.0 0.0 21.0 21.0 21.0 21.0 0.0 0.0
kyle 0.0 0.0 10.0 0.0 0.0 0.0 0.0 0.0 0.0 10.0
Then did a melt get them back into original format..
dfm=dft.reset_index()
dfm=dfm.melt(id_vars=['country','rep'],value_vars=dfm.columns.values[2:],var_name='Year',value_name='sales')
dfm=dfm.loc[dfm.sales>0].reset_index(drop='True')
country rep Year sales
0 UK john 2000 10.0
1 US john 2000 21.0
2 FR claire 2005 20.0
3 US claire 2005 10.0
4 FR kyle 2007 20.0
5 UK kyle 2007 12.0
6 US kyle 2007 10.0
7 UK amy 2009 12.0
8 UK john 2001 10.0
9 US john 2001 21.0
10 UK john 2002 10.0
11 US john 2002 21.0
12 UK john 2003 10.0
13 US john 2003 21.0
14 UK john 2004 10.0
15 US john 2004 21.0
16 FR claire 2006 20.0
17 US claire 2006 10.0
18 FR kyle 2008 20.0
19 UK kyle 2008 12.0
20 US kyle 2008 10.0
Upvotes: 0
Reputation: 323226
I feel like this is a unnesting
problem
s=df.astype(str).groupby('Year').agg(list)
s.index=s.index.astype(int)
s1=s.reindex(np.arange(s.index.min(),s.index.max()+1),method='ffill')
yourdf=unnesting(s1,list('ABC')).reset_index()
yourdf
Out[117]:
Year A B C
0 1950 ABC 0 A
1 1950 CDE 1 A
2 1951 ABC 0 A
3 1951 CDE 1 A
4 1952 ABC 0 A
5 1952 CDE 1 A
6 1953 ABC 0 A
7 1953 CDE 1 A
8 1954 XYZ 1 B
9 1954 123 1 C
10 1955 XYZ 1 B
11 1955 123 1 C
12 1956 X12 1 B
13 1956 123 1 D
14 1956 124 1 D
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
Upvotes: 3
Reputation: 153460
You can try this:
df_out = df.set_index([pd.to_datetime(df['Year'], format='%Y'),'A','B','C'])\
.unstack([1,2,3]).resample('A').ffill()\
.stack([1,2,3]).reset_index([1,2,3])
df_out = df_out.assign(Year=pd.to_datetime(df_out.index).year).reset_index(drop=True)
df_out
Output:
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 ABC 0 A 1951
3 CDE 1 A 1951
4 ABC 0 A 1952
5 CDE 1 A 1952
6 ABC 0 A 1953
7 CDE 1 A 1953
8 123 1 C 1954
9 XYZ 1 B 1954
10 123 1 C 1955
11 XYZ 1 B 1955
12 123 1 D 1956
13 X12 1 B 1956
Upvotes: 2