obscuredbyclouds
obscuredbyclouds

Reputation: 189

loop through multiple dataframes and create datetime index then join dataframes

I have 9 dataframes of different lengths but similar formats. Each dataframe has a year, month, and day column with dates that span from 1/1/2009-12/31/2019, but some dataframes are missing data for some days. I would like to build one large dataframe with a DateTime Index, but I am having trouble creating a loop to convert the year, month, and day columns to a datetime index for each dataframe, and don't know which function to use to join the dataframes together. I have one dataframe called Temp that has all 4017 lines of data for every day of the 11 year period but the rest of the dataframes are missing some dates.

import pandas as pd

#just creating some sample data to make it easier

Temp = pd.DataFrame({'year':[2009,2009,2009,2010,2010,2010,2011,2011,2011,2012,2012,2012,2013,2013,2013,
2014,2014,2014,2015,2015,2015],'month':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],
'day':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],
'T1':[20,21,25,28,30,33,39,35,34,34,31,30,27,24,20,21,25,28,30,33,39],
'T2':[33,39,35,34,34,31,30,27,24,20,21,25,28,30,33,39,20,21,25,28,30]})

WS = pd.DataFrame({'year':[2009,2009,2010,2011,2011,2011,2012,2012,2012,2013,2013,2013,
2014,2014,2014,2015,2015,2015],'month':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],
'day':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],
'WS1':[5.4,5.1,5.2,4.3,4.4,4.4,1.2,1.5,1.6,2.3,2.5,3.1,2.5,4.6,4.4,4.4,1.2,1.5],
'WS2':[5.4,5.1,4.4,4.4,1.2,1.5,1.6,2.3,2.5,5.2,4.3,4.4,4.4,1.2,1.5,1.6,2.3,2.5]})

RH = pd.DataFrame({'year':[2009,2009,2010,2011,2011,2011,2012,2012,2012,2013,2013,2013,
2014,2014,2014],'month':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],
'day':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],
'RH1':[33,38,30,45,52,60,61,66,60,59,30,45,52,60,61], 
'RH2':[33,38,59,30,45,52,60,61,30,45,52,60,61,66,60]})

Okay, so far what I have tried was to first create a loop that would convert the year, month, and day columns into a DateTime index and drop the remaining year, month, and day columns.

df = [Temp, WS, RH]

for dfs in df:
    dfs['date'] = pd.to_datetime(dfs[['year','month','day']])
    dfs.set_index(['date'],inplace=True)
    dfs.drop(columns = ['year','month','day'],inplace=True)

But I keep getting errors that say TypeError: tuple indices must be integers or slices, not list or TypeError: list indices must be integers or slices, not list. Since I can't get over this issue, I'm having trouble discerning what to do after in order to merge all the dataframes together. I assume that I will have to set an index like idx = pd.date_range('2018-01-01 00:00:00', '2018-12-31 23:00:00', freq='H') and then reset_index for the dataframes that are missing the data. And then, couldn't I use a left-join or concatenate since they would all have the same index? The dataframe examples given above do not have the desired date range, I just didn't know how else to make sample dataframes.

Upvotes: 2

Views: 680

Answers (1)

Corralien
Corralien

Reputation: 120391

Is it what are you looking for?

dfs = [Temp, WS, RH]

data = []
for df in dfs:
    data.append(df.set_index(pd.to_datetime(df[["year", "month", "day"]]))
                  .drop(columns=["year", "month", "day"]))
out = pd.concat(data, axis="columns")
>>> out
            T1  T2  WS1  WS2   RH1   RH2
2009-01-01  20  33  5.4  5.4  33.0  33.0
2009-02-02  21  39  5.1  5.1  38.0  38.0
2009-03-03  25  35  NaN  NaN   NaN   NaN
2010-01-01  28  34  NaN  NaN   NaN   NaN
2010-02-02  30  34  NaN  NaN   NaN   NaN
2010-03-03  33  31  5.2  4.4  30.0  59.0
2011-01-01  39  30  4.3  4.4  45.0  30.0
2011-02-02  35  27  4.4  1.2  52.0  45.0
2011-03-03  34  24  4.4  1.5  60.0  52.0
2012-01-01  34  20  1.2  1.6  61.0  60.0
2012-02-02  31  21  1.5  2.3  66.0  61.0
2012-03-03  30  25  1.6  2.5  60.0  30.0
2013-01-01  27  28  2.3  5.2  59.0  45.0
2013-02-02  24  30  2.5  4.3  30.0  52.0
2013-03-03  20  33  3.1  4.4  45.0  60.0
2014-01-01  21  39  2.5  4.4  52.0  61.0
2014-02-02  25  20  4.6  1.2  60.0  66.0
2014-03-03  28  21  4.4  1.5  61.0  60.0
2015-01-01  30  25  4.4  1.6   NaN   NaN
2015-02-02  33  28  1.2  2.3   NaN   NaN
2015-03-03  39  30  1.5  2.5   NaN   NaN

Upvotes: 2

Related Questions