Reputation: 189
I have 9 dataframes of different lengths but similar formats. Each dataframe has a year
, month
, and day
column with dates that span from 1/1/2009-12/31/2019
, but some dataframes are missing data for some days. I would like to build one large dataframe with a DateTime Index, but I am having trouble creating a loop to convert the year, month, and day columns to a datetime index for each dataframe, and don't know which function to use to join the dataframes together. I have one dataframe called Temp
that has all 4017 lines of data for every day of the 11 year period but the rest of the dataframes are missing some dates.
import pandas as pd
#just creating some sample data to make it easier
Temp = pd.DataFrame({'year':[2009,2009,2009,2010,2010,2010,2011,2011,2011,2012,2012,2012,2013,2013,2013,
2014,2014,2014,2015,2015,2015],'month':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],
'day':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],
'T1':[20,21,25,28,30,33,39,35,34,34,31,30,27,24,20,21,25,28,30,33,39],
'T2':[33,39,35,34,34,31,30,27,24,20,21,25,28,30,33,39,20,21,25,28,30]})
WS = pd.DataFrame({'year':[2009,2009,2010,2011,2011,2011,2012,2012,2012,2013,2013,2013,
2014,2014,2014,2015,2015,2015],'month':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],
'day':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],
'WS1':[5.4,5.1,5.2,4.3,4.4,4.4,1.2,1.5,1.6,2.3,2.5,3.1,2.5,4.6,4.4,4.4,1.2,1.5],
'WS2':[5.4,5.1,4.4,4.4,1.2,1.5,1.6,2.3,2.5,5.2,4.3,4.4,4.4,1.2,1.5,1.6,2.3,2.5]})
RH = pd.DataFrame({'year':[2009,2009,2010,2011,2011,2011,2012,2012,2012,2013,2013,2013,
2014,2014,2014],'month':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],
'day':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3],
'RH1':[33,38,30,45,52,60,61,66,60,59,30,45,52,60,61],
'RH2':[33,38,59,30,45,52,60,61,30,45,52,60,61,66,60]})
Okay, so far what I have tried was to first create a loop that would convert the year, month, and day columns into a DateTime index and drop the remaining year, month, and day columns.
df = [Temp, WS, RH]
for dfs in df:
dfs['date'] = pd.to_datetime(dfs[['year','month','day']])
dfs.set_index(['date'],inplace=True)
dfs.drop(columns = ['year','month','day'],inplace=True)
But I keep getting errors that say TypeError: tuple indices must be integers or slices, not list
or TypeError: list indices must be integers or slices, not list
. Since I can't get over this issue, I'm having trouble discerning what to do after in order to merge all the dataframes together. I assume that I will have to set an index like idx = pd.date_range('2018-01-01 00:00:00', '2018-12-31 23:00:00', freq='H')
and then reset_index for the dataframes that are missing the data. And then, couldn't I use a left-join or concatenate since they would all have the same index? The dataframe examples given above do not have the desired date range, I just didn't know how else to make sample dataframes.
Upvotes: 2
Views: 680
Reputation: 120391
Is it what are you looking for?
dfs = [Temp, WS, RH]
data = []
for df in dfs:
data.append(df.set_index(pd.to_datetime(df[["year", "month", "day"]]))
.drop(columns=["year", "month", "day"]))
out = pd.concat(data, axis="columns")
>>> out
T1 T2 WS1 WS2 RH1 RH2
2009-01-01 20 33 5.4 5.4 33.0 33.0
2009-02-02 21 39 5.1 5.1 38.0 38.0
2009-03-03 25 35 NaN NaN NaN NaN
2010-01-01 28 34 NaN NaN NaN NaN
2010-02-02 30 34 NaN NaN NaN NaN
2010-03-03 33 31 5.2 4.4 30.0 59.0
2011-01-01 39 30 4.3 4.4 45.0 30.0
2011-02-02 35 27 4.4 1.2 52.0 45.0
2011-03-03 34 24 4.4 1.5 60.0 52.0
2012-01-01 34 20 1.2 1.6 61.0 60.0
2012-02-02 31 21 1.5 2.3 66.0 61.0
2012-03-03 30 25 1.6 2.5 60.0 30.0
2013-01-01 27 28 2.3 5.2 59.0 45.0
2013-02-02 24 30 2.5 4.3 30.0 52.0
2013-03-03 20 33 3.1 4.4 45.0 60.0
2014-01-01 21 39 2.5 4.4 52.0 61.0
2014-02-02 25 20 4.6 1.2 60.0 66.0
2014-03-03 28 21 4.4 1.5 61.0 60.0
2015-01-01 30 25 4.4 1.6 NaN NaN
2015-02-02 33 28 1.2 2.3 NaN NaN
2015-03-03 39 30 1.5 2.5 NaN NaN
Upvotes: 2