Nicklas Koldkjær
Nicklas Koldkjær

Reputation: 61

For loop in pandas DataFrame data only saves last iteration to excel file

I am looping through the rows of the DataFrame data below and checking if e.g. the value in column Power > 0 and then i want to export these data into an excel file. This also works but it only writes the last iteration in the excel file, i have come to the conclusion that i need to use the append function in some way but i cannot figure out how to make it work.

1.  Location    UnitName    Timestamp           Power        Windspeed   Yaw
2.  Bull Creek  F10         01/11/2014 00:00:00 7,563641548  3,957911002 280,5478821     
3.  Bull Creek  F10         01/11/2014 00:20:00 60,73444748  4,24157236  280,4075012
4.  Bull Creek  F10         01/11/2014 00:30:00 63,15441132  4,241089859 280,3903809
5.  Bull Creek  F10         01/11/2014 00:40:00 59,09280396  4,38904965  280,4152527
6.  Bull Creek  F10         01/11/2014 00:50:00 69,26197052  4,374599175 280,3750916
7.  Bull Creek  F10         01/11/2014 01:00:00 101,0624237  5,343887005 280,5173035
8.  Bull Creek  F10         01/11/2014 01:10:00 122,7936935  5,183885235 280,4681702
9.  Bull Creek  F10         01/11/2014 01:20:00 86,57110596  5,046733923 280,3834534     
10. Bull Creek  F10         01/11/2014 01:40:00 16,74042702  3,024427626 280,1408386
11. Bull Creek  F10         01/11/2014 01:50:00 12,5870142   2,931351769 280,1185913
12. Bull Creek  F10         01/11/2014 02:00:00 -1,029753685 3,116549245 279,9686279
13. Bull Creek  F10         01/11/2014 02:10:00 13,35998058  3,448055706 279,8687134
14. Bull Creek  F10         01/11/2014 02:20:00 17,42461395  2,943588415 280,1383057
15. Bull Creek  F10         01/11/2014 02:30:00 -9,614940643 2,744164819 280,6514893   
16. Bull Creek  F10         01/11/2014 02:50:00 -11,01966286 3,554833538 283,1451416
17. Bull Creek  F10         01/11/2014 03:00:00 -4,383010387 4,279259377 283,3281555


import pandas as pd
import os

os.chdir('C:\Users\NIK\.spyder2\PythonScripts')

fileREF = 'FilterDataREF.xlsx'

dataREF = pd.read_excel(fileREF, sheetname='Sheet1')

filePCU = 'FilterDataPCU.xlsx'

dataPCU = pd.read_excel(filePCU, sheetname='Ark1')

for i in range(len(dataREF)):
    for j in range(len(dataPCU)):
      if dataREF['Timestamp'][i] == dataPCU['Timestamp'][j] and dataREF['Power'][i] > 0 and dataPCU['Power'][j] > 0:

    data_REF = pd.DataFrame([dataREF.loc[i]])
    data_PCU = pd.DataFrame([dataPCU.loc[j]])

    writer = pd.ExcelWriter('common_data.xlsx', engine='xlsxwriter')
    # Convert the dataframe to an XlsxWriter Excel object.
    data_REF.to_excel(writer, sheet_name='Sheet1')
    data_PCU.to_excel(writer, sheet_name='Sheet1', startcol=7)

    writer.save()

Upvotes: 1

Views: 2226

Answers (2)

Orenshi
Orenshi

Reputation: 1873

There's lots of ways to do this. Might I suggest... instead of looping over each row of the DataFrames, try joining or merging them?

merged_data = data_REF.merge(data_PCU, on=['Timestamp'], l_suffix='', r_suffix='PCU')

The above will inner join data_REF and data_PCU on the field Timestamp. I did this since I saw you had dataREF['Timestamp'][i] == dataPCU['Timestamp'][j] in your code. Note that l_suffix='' means that any columns in data_REF that are similarly named in data_PCU will remain. Meanwhile, columns similarly named for data_PCU will have _PCU added to the suffix. So Timestamp_PCU as an example

Once you have a merged DataFrame you can start doing something like

pow_gt_zero = (merged_data['Power'] > 0) & (merged_data['Power_PCU'] > 0)
valid_df = merged_data.loc[pow_gt_zero]

Using .loc above, you are getting a subset of the DataFrame where the condition pow_gt_zero is satisfied.

Now that you have the rows that meet your conditions, you can reference these Timestamps again. You can use them to subset the original DataFrames so that you may write them out to Excel.

data_REF = data_REF.loc[data_REF['Timestamp'].isin(valid_df['Timestamp'])
data_PCU = data_PCU.loc[data_PCU['Timestamp'].isin(valid_df['Timestamp'])

Upvotes: 1

jkm
jkm

Reputation: 704

It saves all the values, you're just overwriting the previous iteration's output every time.

There are several possible solutions. You can aggregate results by appending to a dataframe in each iteration, update your position in the excel and pass it as the start_row for to_excel in the next iteration, you could generate multiple excels by changing the filename, probably a whole lot of other options.

Upvotes: 1

Related Questions