Reputation: 1568
I continue to use pandas.dataframe.append, however appending in the for-loop continues to get written over and information is duplicated. The out put of the for-loop should look exactly like the original dataframe (I am not including functions to eliminate any complexity). Any help would be appreciated.
import pandas as pd
df = pd.DataFrame({
'date': ['2019-01-01','2019-01-01','2019-01-01',
'2019-02-01','2019-02-01','2019-02-01',
'2019-03-01','2019-03-01','2019-03-01',],
'Asset': ['Asset A', 'Asset A', 'Asset A', 'Asset B', 'Asset B', 'Asset B',
'Asset C', 'Asset C', 'Asset C'],
'Monthly Value': [2100, 8100, 1400, 1400, 3100, 1600, 2400, 2100, 2100]
})
print(df.sort_values(by=['Asset']))
date Asset Monthly Value
0 2019-01-01 Asset A 2100
1 2019-01-01 Asset A 8100
2 2019-01-01 Asset A 1400
3 2019-02-01 Asset B 1400
4 2019-02-01 Asset B 3100
5 2019-02-01 Asset B 1600
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
This for loop creates multiple appendages to the df and duplicates rows
assetlist = list(df['Asset'].unique())
for asset in assetlist:
df_subset = df[df['Asset'] == asset]
dfcopy = df_subset.copy()
newdf = newdf.append(dfcopy)
print(newdf)
This output is incorrect, it should look exactly like the original dataframe.
date Asset Monthly Value
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
0 2019-01-01 Asset A 2100
1 2019-01-01 Asset A 8100
2 2019-01-01 Asset A 1400
3 2019-02-01 Asset B 1400
4 2019-02-01 Asset B 3100
5 2019-02-01 Asset B 1600
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
0 2019-01-01 Asset A 2100
1 2019-01-01 Asset A 8100
2 2019-01-01 Asset A 1400
3 2019-02-01 Asset B 1400
4 2019-02-01 Asset B 3100
5 2019-02-01 Asset B 1600
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
0 2019-01-01 Asset A 2100
1 2019-01-01 Asset A 8100
2 2019-01-01 Asset A 1400
3 2019-02-01 Asset B 1400
4 2019-02-01 Asset B 3100
5 2019-02-01 Asset B 1600
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
0 2019-01-01 Asset A 2100
1 2019-01-01 Asset A 8100
2 2019-01-01 Asset A 1400
3 2019-02-01 Asset B 1400
4 2019-02-01 Asset B 3100
5 2019-02-01 Asset B 1600
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
Upvotes: 0
Views: 285
Reputation: 21709
You are missing one line I think:
assetlist = list(df['Asset'].unique())
newdf = pd.DataFrame() # <-- define it as a data frame
for asset in assetlist:
df_subset = df[df['Asset'] == asset]
dfcopy = df_subset.copy()
newdf = newdf.append(dfcopy)
print(newdf)
date Asset Monthly Value
0 2019-01-01 Asset A 2100
1 2019-01-01 Asset A 8100
2 2019-01-01 Asset A 1400
3 2019-02-01 Asset B 1400
4 2019-02-01 Asset B 3100
5 2019-02-01 Asset B 1600
6 2019-03-01 Asset C 2400
7 2019-03-01 Asset C 2100
8 2019-03-01 Asset C 2100
However, an easier way to do this is:
newdf = pd.concat([df.query("Asset == @asset") for asset in assetlist])
Upvotes: 2