Reputation: 109
I'm just learning pandas working on a Jupyter Notebook. I have 10 datasets containing daily temperatures from different years. However, the dates are in different columns for months, days and years, and in a different column the temperature. The original data have spaces in the column names. For example, the information for df1info() is the following:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17301 entries, 0 to 17300
Data columns (total 4 columns):
agno 17301 non-null int64
mes 17301 non-null int64
dia 17301 non-null int64
valor 17301 non-null float64
dtypes: float64(1), int64(3)
memory usage: 540.7 KB
I did write a the following code which perform the job after checking different solutions in this community, I wonder about a more efficient and pandorable way to do it.
list_df = [df1,df2,df3,df4,df5,df6,df7,df8,df9,df10]
for i, df in enumerate(list_dfs):
df = list_df[i]
df = df.rename(columns=lambda x: x.strip())
df['mes'] = df['mes'].astype(str)
df['dia'] = df['dia'].astype(str)
df['combined']=df['agno'].astype(str) + "-" + df["mes"] + "-" + df["dia"]
df["date"]= pd.to_datetime(df["combined"])
df = df[['date','valor']]
list_df[i] = df
i = i + 1
I would appreciate your help. Thanks in advance.
Upvotes: 0
Views: 588
Reputation: 210832
I'd do it this way:
def f(df):
df['date'] = (pd.to_datetime(
df.rename(
columns={'agno':'year', 'mes':'month', 'dia':'day'})
[['year','month','day']]))
return df.drop([['year','month','day']], axis=1)
df = pd.concat([f(x) for x in list_df], ignore_index=True)
Upvotes: 3
Reputation: 323226
I think it can be down by concat
, this will create the multiple index data frame, you can just do those adjustment with the combine dataframe , then just groupby
by the key
(level=0 here) and split those single df to a new list you want
s=pd.concat(list_df,keys=list(range(len(list_df))))
s = s.rename(columns=lambda x: x.strip())
s['mes'] = s['mes'].astype(str)
s['dia'] = s['dia'].astype(str)
s['combined']=s['agno'].astype(str) + "-" + s["mes"] + "-" + s["dia"]
s["date"]= pd.to_datetime(s["combined"])
s = s[['date','valor']]
list_df=[d for _,d in s.groupby(level=0)]
Upvotes: 3