Multi tab Excel sheets, unique entries in 1 column, create new file with data from another column as the name, all with headers

Question

As usual, I've bitten off more than I can chew. I have a file, "list.xlsx". The file has 3 sheets, "current students", "finished" and "cancelled". The sheets all contain data under the following headers [StudentId, FirstName, Lastname, DoB, Nationality, CourseID, CourseName, Startdate, Finishdate, UnitID, UnitName, UnitCompetency]

I've created the below abomination that starts off what I need.

What I want it to do is:

1) Create a file with FirstName + Lastname.xlsx based off of the StudentId (unique) in the folder named after their sheet

2) In that file, take all information from the rest of the columns and append it to their file

    #python 3.8
import pandas as pd
import os
import shutil

file = "list.xlsx"
CS = "current student"
Fin = "finished"
Can = "cancelled"
TheList = {CS, Fin, Can}
CanXlsx = pd.read_excel(file, sheet_name = Can)
FinXlsx = pd.read_excel(file, sheet_name = Fin)
CSXlsx = pd.read_excel(file, sheet_name = CS)

if os.path.exists(CS):
    shutil.rmtree(CS)
os.mkdir(CS)
CSDir = '//current student//'
if os.path.exists(Fin):
    shutil.rmtree(Fin)
os.mkdir(Fin)
FinDir = '//finished//'
if os.path.exists(Can):
    shutil.rmtree(Can)
os.mkdir(Can)
CanDir = '//cancelled//'

CancelID = CanXlsx.StudentId.unique()
FinID = FinXlsx.StudentId.unique()
CSID = CSXlsx.StudentId.unique()

I thought I was getting better with for loops and the like, but can't seem to get my head around them. I can think about the logic, but it just doesn't come through with the code.

https://drive.google.com/file/d/134fqWx6veF7zp_12GqFYlbmPZnK8ihaV/view?usp=sharing

EnriqueBet · Accepted Answer

I think the approach needed for this is to create 3 data frames (probably is possible to do it with one, but I don't remember). 1) Then, on each data frame, you will need to extract a list of "First Name + Last Name", and after that, 2) you will need to create masks on the data frames to extract the information and save it.

import os
import shutil

file = "list.xlsx"
CS = "current student"
Fin = "finished"
Can = "cancelled"
TheList = {CS, Fin, Can}
CanXlsx = pd.read_excel(file, sheet_name = Can)
FinXlsx = pd.read_excel(file, sheet_name = Fin)
CSXlsx = pd.read_excel(file, sheet_name = CS)

## File Creation
if os.path.exists(CS):
    shutil.rmtree(CS)
os.mkdir(CS)
CSDir = '//current student//'
if os.path.exists(Fin):
    shutil.rmtree(Fin)
os.mkdir(Fin)
FinDir = '//finished//'
if os.path.exists(Can):
    shutil.rmtree(Can)
os.mkdir(Can)
CanDir = '//cancelled//'

# Create full names
CanXlsx["Fullname"] = CanXlsx["StudentId"] + "_" + CanXlsx["First Name"] + "_" + CanXlsx["Last Name"]
## Same for the other dfs

# Get a list of ids
# canFullNames = list(CanXlsx["Fullname"]) Edit: Preferred approach through student Ids
canIds = list(CanXlsx["StudentId"])
## Same for the other dfs

# Loop over the list of full names to create your df
for id in canIds:
    df1 = CanXlsx[CanXlsx["StudenId"] == id] # This will filter the rows by the id you want
    # Retrieve the full name
    name = df1.iloc[0]["Fullname"]

    # Create the filename
    filename = os.path.join(CanDir,name + ".xlsx")

    df1.drop(columns = ["First Name", "Last Name"] # I understand that these columns are not required on each file
    df1.to_excel(filename,header=True,index=False)

## Same for the other dfs

Let me know if this helps, at least this is what I understood you want to achieve with your code. :D

Multi tab Excel sheets, unique entries in 1 column, create new file with data from another column as the name, all with headers

Answers (1)

Related Questions