Reputation: 116
I have 2 columns in a Pandas DataFrame and a dictionary generator function which takes the input from the dataframe rolling indexes and gives output as a dictionary and then it should add the keys as column and values as rows to the existing pandas dataframe from a specific index. The sample DataFrame is:
+-------+---+---+
| Index | A | B |
+-------+---+---+
| 0 | 2 | 4 |
| 1 | 5 | 6 |
| 2 | 1 | 7 |
| 3 | 4 | 6 |
| 4 | 2 | 7 |
| 5 | 8 | 4 |
| 6 | 3 | 1 |
| 7 | 8 | 2 |
+-------+---+---+
The code which takes the input rows(window) from dataframe is below:
def stack(df, window=3):
for i in range(0, df.shape[0] - window):
dfp = df[i:i+window]
mp = addition(dfp) #a dict generator function to add 3 previous values of column a and b and give output with a single dict {'C': value, 'D': value}
for key, value in mp.items(): # to assign keys as column and values as rows
df.loc['i+window', key] = value # to assign rows from a specific index -3
return df
The for-loop function produces 1 dictionary with each loop which look like this :
{'C': 8, 'D': 17} #1st loop
{'C': 10, 'D': 19} #2nd loop
{'C': 7, 'D': 20} #3rd loop
{'C': 14, 'D': 17} #4th loop
{'C': 13, 'D': 12} #5th loop
But there is an error in output while applying the above slower function makes every row values to NaN except the last one. I want the The expected output should be added to the dataframe row by row with every loop and the final dataframe should look like below:
+-------+---+---+-----+-----+
| Index | A | B | C | D |
+-------+---+---+-----+-----+
| 0 | 2 | 4 | NaN | NaN |
| 1 | 5 | 6 | NaN | NaN |
| 2 | 1 | 7 | 8 | 17 |
| 3 | 4 | 6 | 10 | 19 |
| 4 | 2 | 7 | 7 | 20 |
| 5 | 8 | 4 | 14 | 17 |
| 6 | 3 | 1 | 13 | 12 |
+-------+---+---+-----+-----+
Besides the above expected output I also want to make the looping as faster as possible. Please make me understand where i an going wrong and pardon me for my bad english..
Upvotes: 1
Views: 3505
Reputation: 153460
Another option instead of looping:
df.combine_first(pd.DataFrame(dd_list, index=range(window,len(dd_list)+window)))
Updating with what I think you are asking as far as adding a dictionary to a dataframe:
dd_list = [{'C': 8, 'D': 17}, #1st loop
{'C': 10, 'D': 19}, #2nd loop
{'C': 7, 'D': 20}, #3rd loop
{'C': 14, 'D': 17}, #4th loop
{'C': 13, 'D': 12}, ]
window = 2
for n, i in enumerate(dd_list):
df = df.combine_first(pd.DataFrame(i, index=[n+window]))
print(df)
Output:
A B C D
0 2 4 NaN NaN
1 5 6 NaN NaN
2 1 7 8.0 17.0
3 4 6 10.0 19.0
4 2 7 7.0 20.0
5 8 4 14.0 17.0
6 3 1 13.0 12.0
7 8 2 NaN NaN
As @QuangHoang was suggesting, to generate your output you can do this with method:
df.join(df.rolling(3).sum().rename(columns={'A':'C', 'B':'D'}))
Output:
A B C D
Index
0 2 4 NaN NaN
1 5 6 NaN NaN
2 1 7 8.0 17.0
3 4 6 10.0 19.0
4 2 7 7.0 20.0
5 8 4 14.0 17.0
6 3 1 13.0 12.0
7 8 2 19.0 7.0
Upvotes: 1