Reputation: 5097
Fairly new to pandas and I have created a data frame called rollParametersDf:
rollParametersDf = pd.DataFrame(columns=['insampleStart','insampleEnd','outsampleStart','outsampleEnd'], index=[])
with the 4 column headings given. Which I would like to hold the reference dates for a study I am running. I want to add rows of data (one at a time) with the index name roll1, roll2..rolln that is created using the following code:
outsampleEnd = customCalender.iloc[[totalDaysAvailable]]
outsampleStart = customCalender.iloc[[totalDaysAvailable-outsampleLength+1]]
insampleEnd = customCalender.iloc[[totalDaysAvailable-outsampleLength]]
insampleStart = customCalender.iloc[[totalDaysAvailable-outsampleLength-insampleLength+1]]
print('roll',rollCount,'\t',outsampleEnd,'\t',outsampleStart,'\t',insampleEnd,'\t',insampleStart,'\t')
rollParametersDf.append({insampleStart,insampleEnd,outsampleStart,outsampleEnd})
I have tried using append but cannot get an individual row to append.
I would like the final dataframe to look like:
insampleStart insampleEnd outsampleStart outsampleEnd
roll1 1 5 6 8
roll2 2 6 7 9
:
rolln
Upvotes: 2
Views: 9403
Reputation: 10083
You give key-values pairs to append
df = pd.DataFrame({'insampleStart':[], 'insampleEnd':[], 'outsampleStart':[], 'outsampleEnd':[]})
df = df.append({'insampleStart':[1,2], 'insampleEnd':[5,6], 'outsampleStart':[6,7], 'outsampleEnd':[8,9]}, ignore_index=True)
Upvotes: 2
Reputation: 166
The pandas documentation has an example of appending rows to a DataFrame. This appending action is different from that of a list in that this appending action generates a new DataFrame. This means that for each append action you are rebuilding and reindexing the DataFrame which is pretty inefficient. Here is an example solution:
# create empty dataframe
columns=['insampleStart','insampleEnd','outsampleStart','outsampleEnd']
rollParametersDf = pd.DataFrame(columns=columns)
# loop through 5 rows and append them to the dataframe
for i in range(5):
# create some artificial data
data = np.random.normal(size=(1, len(columns)))
# append creates a new dataframe which makes this operation inefficient
# ignore_index causes reindexing on each call.
rollParametersDf = rollParametersDf.append(pd.DataFrame(data, columns=columns),
ignore_index=True)
print rollParametersDf
insampleStart insampleEnd outsampleStart outsampleEnd
0 2.297031 1.792745 0.436704 0.706682
1 0.984812 -0.417183 -1.828572 -0.034844
2 0.239083 -1.305873 0.092712 0.695459
3 -0.511505 -0.835284 -0.823365 -0.182080
4 0.609052 -1.916952 -0.907588 0.898772
Upvotes: 1