Reputation: 75
As usual, I've bitten off more than I can chew. I have a file, "list.xlsx". The file has 3 sheets, "current students", "finished" and "cancelled". The sheets all contain data under the following headers [StudentId, FirstName, Lastname, DoB, Nationality, CourseID, CourseName, Startdate, Finishdate, UnitID, UnitName, UnitCompetency]
I've created the below abomination that starts off what I need.
What I want it to do is:
1) Create a file with FirstName + Lastname.xlsx based off of the StudentId (unique) in the folder named after their sheet
2) In that file, take all information from the rest of the columns and append it to their file
#python 3.8
import pandas as pd
import os
import shutil
file = "list.xlsx"
CS = "current student"
Fin = "finished"
Can = "cancelled"
TheList = {CS, Fin, Can}
CanXlsx = pd.read_excel(file, sheet_name = Can)
FinXlsx = pd.read_excel(file, sheet_name = Fin)
CSXlsx = pd.read_excel(file, sheet_name = CS)
if os.path.exists(CS):
shutil.rmtree(CS)
os.mkdir(CS)
CSDir = '//current student//'
if os.path.exists(Fin):
shutil.rmtree(Fin)
os.mkdir(Fin)
FinDir = '//finished//'
if os.path.exists(Can):
shutil.rmtree(Can)
os.mkdir(Can)
CanDir = '//cancelled//'
CancelID = CanXlsx.StudentId.unique()
FinID = FinXlsx.StudentId.unique()
CSID = CSXlsx.StudentId.unique()
I thought I was getting better with for loops and the like, but can't seem to get my head around them. I can think about the logic, but it just doesn't come through with the code.
https://drive.google.com/file/d/134fqWx6veF7zp_12GqFYlbmPZnK8ihaV/view?usp=sharing
Upvotes: 0
Views: 88
Reputation: 1473
I think the approach needed for this is to create 3 data frames (probably is possible to do it with one, but I don't remember). 1) Then, on each data frame, you will need to extract a list of "First Name + Last Name", and after that, 2) you will need to create masks on the data frames to extract the information and save it.
import os
import shutil
file = "list.xlsx"
CS = "current student"
Fin = "finished"
Can = "cancelled"
TheList = {CS, Fin, Can}
CanXlsx = pd.read_excel(file, sheet_name = Can)
FinXlsx = pd.read_excel(file, sheet_name = Fin)
CSXlsx = pd.read_excel(file, sheet_name = CS)
## File Creation
if os.path.exists(CS):
shutil.rmtree(CS)
os.mkdir(CS)
CSDir = '//current student//'
if os.path.exists(Fin):
shutil.rmtree(Fin)
os.mkdir(Fin)
FinDir = '//finished//'
if os.path.exists(Can):
shutil.rmtree(Can)
os.mkdir(Can)
CanDir = '//cancelled//'
# Create full names
CanXlsx["Fullname"] = CanXlsx["StudentId"] + "_" + CanXlsx["First Name"] + "_" + CanXlsx["Last Name"]
## Same for the other dfs
# Get a list of ids
# canFullNames = list(CanXlsx["Fullname"]) Edit: Preferred approach through student Ids
canIds = list(CanXlsx["StudentId"])
## Same for the other dfs
# Loop over the list of full names to create your df
for id in canIds:
df1 = CanXlsx[CanXlsx["StudenId"] == id] # This will filter the rows by the id you want
# Retrieve the full name
name = df1.iloc[0]["Fullname"]
# Create the filename
filename = os.path.join(CanDir,name + ".xlsx")
df1.drop(columns = ["First Name", "Last Name"] # I understand that these columns are not required on each file
df1.to_excel(filename,header=True,index=False)
## Same for the other dfs
Let me know if this helps, at least this is what I understood you want to achieve with your code. :D
Upvotes: 1