Reputation: 5135
I have 3 df
:
df_A
:
year month day A
0 2014 1 1 15.8
1 2014 1 2 21.0
2 2014 1 3 22.3
3 2014 1 4 20.2
4 2014 1 5 20.0
... ... ... ... ...
df_B
:
year month day B
0 2014 1 1 15.8
1 2014 1 2 21.0
2 2014 1 3 22.3
3 2014 1 4 20.2
4 2014 1 5 20.0
... ... ... ... ...
df_C
:
year month day C
0 2014 1 1 15.8
1 2014 1 2 21.0
2 2014 1 3 22.3
3 2014 1 4 20.2
4 2014 1 5 20.0
... ... ... ... ...
I want to 1) join them side by side; 2) Integrate the 3 date columns into one, which looks like this:
date A B C
0 2014-1-1 15.8 15.8 15.8
1 2014-1-2 21.0 21.0 21.0
2 2014-1-3 22.3 22.3 22.3
3 2014-1-4 20.2 20.2 20.2
4 2014-1-5 20.0 20.0 20.0
... ... ... ... ...
I tried
df_A['date']=pd.to_datetime(df_A[['year','month','day']])
but it returned
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-12-88de4e50b4f6> in <module>()
----> 1 df_A['date']=pd.to_datetime(df_A[['year','month','day']])
2
3
3 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1183 if not (self.name == "loc" and not raise_missing):
1184 not_found = list(set(key) - set(ax))
-> 1185 raise KeyError("{} not in index".format(not_found))
1186
1187
KeyError: "['year'] not in index"
What is the best way to do this?
Upvotes: 1
Views: 92
Reputation: 4215
Try:
df_result = ((df.set_index(['year','month','day'])
.join([df1.set_index(['year','month','day']),
df2.set_index(['year','month','day'])]))
.reset_index())
df_result['date'] = pd.to_datetime((df.year*10000+df.month*100+df.day).apply(str),format='%Y%m%d')
df_result.drop(['year','month','day'], axis=1, inplace=True)
df_result
A B C date
0 15.8 15.8 15.8 2014-01-01
1 21.0 21.0 21.0 2014-01-02
2 22.3 22.3 22.3 2014-01-03
3 20.2 20.2 20.2 2014-01-04
4 20.0 20.0 20.0 2014-01-05
Upvotes: 0
Reputation: 23099
IIUC, we can use a function to clean your dates and then concat along axis =1
def create_datetime(dataframe, year="year", month="month", day="day"):
dataframe["year"] = pd.to_datetime(
df[year].astype(str) + "-" + df[month].astype(str) + "-" + df[day].astype(str),
format="%Y-%m-%d",
)
dataframe = dataframe.drop([month, day], axis=1)
dataframe = dataframe.rename(columns={year : 'date'})
dataframe = dataframe.set_index('date')
return dataframe
dfA = create_datetime(dfA)
dfB = create_datetime(dfB)
dfC = create_datetime(dfC)
final = pd.concat([dfA,dfB,dfC],axis=1)
A C C
date
2014-01-01 15.8 15.8 15.8
2014-01-02 21.0 21.0 21.0
2014-01-03 22.3 22.3 22.3
2014-01-04 20.2 20.2 20.2
2014-01-05 20.0 20.0 20.0
Upvotes: 1