OTRAY
OTRAY

Reputation: 115

How to iterate over every data entry and subtract each one from another? [Python]

So i have the following (i'm working with pandas btw.) when i print the head. As you can see i already get back the first entry of every day.

                     first
Timestamp              
2021-03-11  596319.8125
2021-03-12  596349.5625
2021-03-13  602205.1875
2021-03-14  609445.5625
2021-03-15  609479.6250

And now i want to get the yield from every day, which means i have to subtract the value (in this case Energy = first) of day 1 from day 2, and than the value of day 2 from day 3 and so on and so forth.

Here's my code so far, i know this only can be a noob problem but i'm very new to programming so i'm thankful for every answer.

  import os
    import glob
    import pandas as pd
    
    # set working directory
    os.chdir("Path to CSVs")
    
    # find all csv files in the folder
    # use glob pattern matching -> extension = 'csv'
    # save result in list -> all_filenames
    extension = 'csv'
    all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
    # print(all_filenames)
    
    # combine all files in the list
    combined_csv = pd.concat([pd.read_csv(f, sep=';') for f in all_filenames])
    # Format CSV
    # Transform Timestamp column into datetime
    combined_csv['Timestamp'] = pd.to_datetime(combined_csv.Timestamp)
    # Read out first entry of every day of every month
    combined_csv = combined_csv.resample('D', on='Timestamp')['HtmDht_Energy'].agg(['first'])
    # To get the yield of day i have to subtract day 2 HtmDht_Energy - day 1 HtmDht_Energy
    #
    #
    #
    #
    print(combined_csv.head())
    # export as excel file
    # combined_csv.to_excel('energy_data.xlsx', index=False)

After i tried the solution from below with the following code:

 import os
    import glob
    import pandas as pd
    
    # set working directory
    os.chdir("C:/Users/winklerm/OneDrive - SOLID Solar Energy Systems GmbH/Desktop/CSVs")
    
    # find all csv files in the folder
    # use glob pattern matching -> extension = 'csv'
    # save result in list -> all_filenames
    extension = 'csv'
    all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
    # print(all_filenames)
    
    # combine all files in the list
    combined_csv = pd.concat([pd.read_csv(f, sep=';') for f in all_filenames])
    # Format CSV
    # Transform Timestamp column into datetime
    combined_csv['Timestamp'] = pd.to_datetime(combined_csv.Timestamp)
    # Read out first entry of every day of every month
    combined_csv = combined_csv.resample('D', on='Timestamp')['HtmDht_Energy'].agg(['first'])
    # To get the yield of day i have to subtract day 2 HtmDht_Energy - day 1 HtmDht_Energy
    combined_csv.assign(tagesertrag=combined_csv["first"]-combined_csv["first"].shift())
    #
    #
    #
    print(combined_csv.head())
    # export as excel file
    # combined_csv.to_excel('energy_data.xlsx', index=False)

But my print still looks like this:

                  first
Timestamp              
2021-03-11  596319.8125
2021-03-12  596349.5625
2021-03-13  602205.1875
2021-03-14  609445.5625
2021-03-15  609479.6250

Process finished with exit code 0

Upvotes: 0

Views: 59

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31226

This is a simple case of subtracting shift() value

df = pd.read_csv(io.StringIO(""" Timestamp                    first
2021-03-11  596319.8125
2021-03-12  596349.5625
2021-03-13  602205.1875
2021-03-14  609445.5625
2021-03-15  609479.6250"""), sep="\s+")

df.assign(yieldc=df["first"]-df["first"].shift())
Timestamp first yieldc
0 2021-03-11 596320 nan
1 2021-03-12 596350 29.75
2 2021-03-13 602205 5855.62
3 2021-03-14 609446 7240.38
4 2021-03-15 609480 34.0625

Upvotes: 1

Related Questions