Reputation: 109

pandas: looping through several dataframes for efficiency

I'm just learning pandas working on a Jupyter Notebook. I have 10 datasets containing daily temperatures from different years. However, the dates are in different columns for months, days and years, and in a different column the temperature. The original data have spaces in the column names. For example, the information for df1info() is the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17301 entries, 0 to 17300
Data columns (total 4 columns):
agno      17301 non-null int64
mes      17301 non-null int64
dia      17301 non-null int64
valor    17301 non-null float64
dtypes: float64(1), int64(3)
memory usage: 540.7 KB

I did write a the following code which perform the job after checking different solutions in this community, I wonder about a more efficient and pandorable way to do it.

list_df = [df1,df2,df3,df4,df5,df6,df7,df8,df9,df10]
for i, df in enumerate(list_dfs):
df = list_df[i]
df = df.rename(columns=lambda x: x.strip())
df['mes'] = df['mes'].astype(str)
df['dia'] = df['dia'].astype(str)
df['combined']=df['agno'].astype(str) + "-" + df["mes"] + "-" + df["dia"]
df["date"]= pd.to_datetime(df["combined"])
df = df[['date','valor']]
list_df[i] = df
i = i + 1

I would appreciate your help. Thanks in advance.

Upvotes: 0

Answers (2)

MaxU - stand with Ukraine

Reputation: 210832

I'd do it this way:

def f(df):
    df['date'] = (pd.to_datetime(
                    df.rename(
                        columns={'agno':'year', 'mes':'month', 'dia':'day'})
                    [['year','month','day']]))
    return df.drop([['year','month','day']], axis=1)

df = pd.concat([f(x) for x in list_df], ignore_index=True)

Upvotes: 3

BENY

Reputation: 323226

I think it can be down by concat, this will create the multiple index data frame, you can just do those adjustment with the combine dataframe , then just groupby by the key(level=0 here) and split those single df to a new list you want

s=pd.concat(list_df,keys=list(range(len(list_df))))

s = s.rename(columns=lambda x: x.strip())
s['mes'] = s['mes'].astype(str)
s['dia'] = s['dia'].astype(str)
s['combined']=s['agno'].astype(str) + "-" + s["mes"] + "-" + s["dia"]
s["date"]= pd.to_datetime(s["combined"])
s = s[['date','valor']]


list_df=[d for _,d in s.groupby(level=0)]

Upvotes: 3

pandas: looping through several dataframes for efficiency

Answers (2)

Related Questions