Reputation: 1
Example, if a person worked from 12/1/2020 to 4/17/2022 and their annual salary is 40000
Employee start date | Employee end date | Annual Salary |
---|---|---|
12/1/2020 | 4/17/2022 | 40000 |
It should display the below when 2021 is selected(In PowerBI), since they are employed throughout 2021(40000/12) (Year starts from Dec through Nov)
Dec | Jan | Feb | Mar | Apr | May | June | July | August | Sep | Oct | Nov |
---|---|---|---|---|---|---|---|---|---|---|---|
3,333.33 | 3,333.33 | 3,333.33 | 3,333.33 | 3,333.33 | 3,333.33 | 3,333.33 | 3,333.33 | 3,333.33 | 3,333.33 | 3,333.33 | 3,333.33 |
when 2022 is selected it should be 40,000/5, since they only worked from Dec 2021 - April 2022
Dec | Jan | Feb | Mar | Apr | May | June | July | August | Sep | Oct | Nov |
---|---|---|---|---|---|---|---|---|---|---|---|
8,000.00 | 8,000.00 | 8,000.00 | 8,000.00 | 8,000.00 |
I have the following figured out until now, but do not know how to proceed.
dataset["Employee start date"] = pd.to_datetime(dataset["Employee start date"], format="%m/%d/%Y")
dataset["Employee end date"] = pd.to_datetime(dataset["Employee end date"], format="%m/%d/%Y")
dataset['months'] = [pd.period_range(s, e, freq='m') for s, e in zip(dataset['Employee start date'], dataset["Employee end date"]) ]
dataset = dataset.explode('months').reset_index(drop=True)
Any help will be appreciated, thanks!
Upvotes: 0
Views: 46
Reputation: 54635
Here's one way to think about it. This pre-processes your input file as a CSV, producing rows that I read in with pandas. This produces rows for every year, where the row includes the year and the ID number:
import csv
import pandas as pd
import datetime
def fiscalyear(dt):
return dt.year if dt.month < 12 else dt.year+1
def fiscalmonth(dt):
return dt.month+1 if dt.month < 12 else 1
# Read the data.
columns = []
data = []
for row in csv.reader(open('x.csv')):
if not columns:
columns = row
else:
nid = int(row[0])
start = datetime.datetime.strptime(row[1], '%m/%d/%Y')
end = datetime.datetime.strptime(row[2], '%m/%d/%Y')
sal = float(row[3])
data.append( (nid,start,end,sal) )
# Find minimum year.
first = min(fiscalyear(d[1]) for d in data)
last = 2023
headers = ['Year','ID','Dec','Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov']
def plan_a():
# Write a CSV.
with open('out.csv','w') as f:
out = csv.writer(f)
out.writerow(headers)
for year in range(first,last+1):
for nid,start,end,sal in data:
if fiscalyear(start) <= year <= fiscalyear(end):
smo = fiscalmonth(start) if fiscalyear(start)== year else 1
emo = fiscalmonth(end) if fiscalyear(end) == year else 12
nmo = emo - smo + 1
row = [year,nid] + ['']*(smo-1) + [sal/12]*nmo + ['']*(12-emo)
out.writerow(row)
# Read it.
df = pd.read_csv( "out.csv" )
print(df)
def plan_b():
# Create a list.
out = []
for year in range(first,last+1):
for nid,start,end,sal in data:
if fiscalyear(start) <= year <= fiscalyear(end):
smo = fiscalmonth(start) if fiscalyear(start)== year else 1
emo = fiscalmonth(end) if fiscalyear(end) == year else 12
nmo = emo - smo + 1
row = [year,nid] + ['']*(smo-1) + [sal/12]*nmo + ['']*(12-emo)
out.append(row)
# Make a dataframe.
df = pd.DataFrame(out, columns=headers)
print(df)
plan_b()
Output:
Year ID Dec ... Sep Oct Nov
0 1987 73 ... 471.404267 471.404267 471.404267
1 1988 56 ... 3709.038667 3709.038667 3709.038667
2 1988 73 471.404267 ... 471.404267 471.404267 471.404267
3 1988 94 ... 4433.433492 4433.433492 4433.433492
4 1989 56 3709.038667 ... 3709.038667 3709.038667 3709.038667
.. ... .. ... ... ... ... ...
200 2023 70 6860.585783 ... 6860.585783 6860.585783 6860.585783
201 2023 73 471.404267 ... 471.404267 471.404267 471.404267
202 2023 85 6369.825 ... 6369.825 6369.825 6369.825
203 2023 94 4433.433492 ... 4433.433492 4433.433492 4433.433492
204 2023 96 5331.354233 ... 5331.354233 5331.354233 5331.354233
[205 rows x 14 columns]
The unused plan_a
there writes the data to a CSV and then reads the CSV. That would be nice if you needed the CSV for other things.
Upvotes: 0