Pad
Pad

Reputation: 911

Create new dataframe from multiple multi-index dataframes

I want to create a new dataframe with x amount of years which takes random seasons from previous weather data.

Code to illustrate the problem:

import pandas as pd
import numpy as np

dates = pd.date_range('20070101',periods=3200)
df = pd.DataFrame(data=np.random.randint(0,100,(3200,1)), columns =list('A'))
df['date'] = dates
df = df[['date','A']]

Apply season function to the datetime index

def get_season(row):
    if row['date'].month >= 3 and row['date'].month <= 5:
        return '2'
    elif row['date'].month >= 6 and row['date'].month <= 8:
        return '3'
    elif row['date'].month >= 9 and row['date'].month <= 11:
        return '4'
    else:
        return '1'

Apply the function

df['Season'] = df.apply(get_season, axis=1)

Create a 'Year' column for indexing

df['Year'] = df['date'].dt.year

Multi-index by Year and Season

df = df.set_index(['Year', 'Season'], inplace=False)

Create new dataframes based on season to select from

winters = df.query('Season == "1"')
springs = df.query('Season == "2"')
summers = df.query('Season == "3"')
autumns = df.query('Season == "4"')

I now want to create a new DataFrame which takes a random winter from the wintersdataframe, followed by a random spring from the springs, followed by a random summer from summersand random autumn from autumns and does this for a specified number of years (e.g. 100) but I can't see how to do this.

EDIT:

Duplicate seasons are allowed (it should sample seasons randomly), and the first spring does not have to belong to the same year as the first winter, this doesn't matter.

EDIT 2: Solution using all seasonal dataframes:

years = df['date'].dt.year.unique()
dfs = []
for i in range(outputyears):
    dfs.append(winters.query("Year == %d"  %np.random.choice(years, 1)))
    dfs.append(springs.query("Year == %d"  %np.random.choice(years, 1)))
    dfs.append(summers.query("Year == %d"  %np.random.choice(years, 1)))
    dfs.append(autumns.query("Year == %d"  %np.random.choice(years, 1)))

rnd = pd.concat(dfs)

Upvotes: 0

Views: 518

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210982

It's most probably not the best way to do it, but you can do it this way:

years = df['date'].dt.year.unique()

dfs = []
for i in range(100):
    dfs.append(df.query("Year == %d and Season == '1'"  %np.random.choice(years, 1)))
    dfs.append(df.query("Year == %d and Season == '2'"  %np.random.choice(years, 1)))
    dfs.append(df.query("Year == %d and Season == '3'"  %np.random.choice(years, 1)))
    dfs.append(df.query("Year == %d and Season == '4'"  %np.random.choice(years, 1)))

rnd = pd.concat(dfs)

Upvotes: 1

Related Questions