progster
progster

Reputation: 937

loop to split dataframes according to values of variable

I would like to create different df, one for each possible combinations of two variables (both variables have 3 items, so the combinations are 9).

Each df should be named according to the values of the variables, for instance df_A_N.

Here my intents, but I cannot assign the name or generalize to more variables:

df = pd.DataFrame({'v1': pd.Categorical(["A","A","AA","AAA"]),'v2': pd.Categorical(["N","N","W","E"])})
df

n = []
for i in df.v1:
    if i not in n:
        n.append(i)
print (n)

for i in n:
    df_new=df[df.v1==i]

df_new

an example of expected output

df_new_A_N = pd.DataFrame({'v1': pd.Categorical(["A"]),'v2': pd.Categorical(["N"])})

df_new_A_W = pd.DataFrame({'v1': pd.Categorical(["A"]),'v2': pd.Categorical(["W"])})

etc.

Upvotes: 0

Views: 117

Answers (2)

Andrew L
Andrew L

Reputation: 7038

You'll need to use ExcelWriter to save multiple sheets in one doc (as far as I know). To address the second part of your question try the following:

writer = pd.ExcelWriter(r"/yourpath/excel_doc.xlsx")
for i in df.v1.unique():
    for x in df.v2.unique():
        temp = df[(df['v1'] == i) & (df['v2'] == x)]
        temp.to_excel(writer, "df_"+i+"_"+x)

Same structure as the above.

Upvotes: 1

Andrew L
Andrew L

Reputation: 7038

Don't know why you would want to do this but here's one way:

for i in df.v1.unique():
    for x in df.v2.unique():
        globals()["df_"+i+"_"+x] = df[(df['v1'] == i) & (df['v2'] == x)]

There may be a DataFrame method to do this but I haven't used it.

Upvotes: 1

Related Questions