Reputation: 21
I am struggling to average my data into monthly mean taking into account that the data can represent two months (and years), and there are empty periods. Thus a need to weigh the samples accordantly, i.e. a sample with 2 days in one month and 5 in the next should have 2/7 and 5/7 weight, respectively. I have tried to first reindex to daily data and then resample, but problems with empty periods (do not want to fill these) and irregular sampling periods. Surly I can calculate the duration for each periods (end-start), but not sure if that helps because also need to know the split between months.
Here is part of the data:
start end conc
26.12.2019 07:00 03.01.2020 07:00 0.235
03.01.2020 07:00 14.01.2020 07:00 nan
14.01.2020 07:00 20.01.2020 07:00 0.032
21.01.2020 07:00 29.01.2020 07:00 1.601
29.01.2020 07:00 06.02.2020 07:00 0.815
06.02.2020 07:00 10.02.2020 07:00 nan
10.02.2020 07:00 20.02.2020 07:00 3.029
20.02.2020 07:00 28.02.2020 07:00 2.158
28.02.2020 07:00 09.03.2020 07:00 0.784
09.03.2020 07:00 18.03.2020 07:00 0.304
The results should be:
jan.2020 0.829
feb.2020 2.128
Upvotes: 2
Views: 203
Reputation: 81
From a software design prospective, it looks like you need to do some data cleaning before trying to interact with the data.
You need to define what you want to do. Do you want to get rid of all the nan
values, or treat them as 0? What if an event starts in one month but ends in another? Once you have these questions, you can start thinking about what to code. Below would be the approach I would take (But I don't know exactly what you want to do with the data):
NaN
sstart end Duration month conc
26.12.2019 07:00 03.01.2020 07:00 x sec Dec 0.235
03.01.2020 07:00 14.01.2020 07:00 y sec Jan nan
14.01.2020 07:00 20.01.2020 07:00 z sec Jan 0.032
avg = tot_conc / tot_duration
. Then add this value to a new table: | month | avg_conc |
| Feb. | 2.218. |
Upvotes: 0