kp987
kp987

Reputation: 1

convert a table to a different view

Example, if a person worked from 12/1/2020 to 4/17/2022 and their annual salary is 40000

Employee start date Employee end date Annual Salary
12/1/2020 4/17/2022 40000

It should display the below when 2021 is selected(In PowerBI), since they are employed throughout 2021(40000/12) (Year starts from Dec through Nov)

Dec Jan Feb Mar Apr May June July August Sep Oct Nov
3,333.33 3,333.33 3,333.33 3,333.33 3,333.33 3,333.33 3,333.33 3,333.33 3,333.33 3,333.33 3,333.33 3,333.33

when 2022 is selected it should be 40,000/5, since they only worked from Dec 2021 - April 2022

Dec Jan Feb Mar Apr May June July August Sep Oct Nov
8,000.00 8,000.00 8,000.00 8,000.00 8,000.00

I have the following figured out until now, but do not know how to proceed.

dataset["Employee start date"] = pd.to_datetime(dataset["Employee start date"], format="%m/%d/%Y")
dataset["Employee end date"] = pd.to_datetime(dataset["Employee end date"], format="%m/%d/%Y")

dataset['months'] = [pd.period_range(s, e, freq='m') for s, e in zip(dataset['Employee start date'], dataset["Employee end date"])  ]

dataset = dataset.explode('months').reset_index(drop=True)


Any help will be appreciated, thanks!

Upvotes: 0

Views: 46

Answers (1)

Tim Roberts
Tim Roberts

Reputation: 54635

Here's one way to think about it. This pre-processes your input file as a CSV, producing rows that I read in with pandas. This produces rows for every year, where the row includes the year and the ID number:

import csv
import pandas as pd
import datetime

def fiscalyear(dt):
    return dt.year if dt.month < 12 else dt.year+1
def fiscalmonth(dt):
    return dt.month+1 if dt.month < 12 else 1

# Read the data.

columns = []
data = []
for row in csv.reader(open('x.csv')):
    if not columns:
        columns = row
    else:
        nid = int(row[0])
        start = datetime.datetime.strptime(row[1], '%m/%d/%Y')
        end = datetime.datetime.strptime(row[2], '%m/%d/%Y')
        sal = float(row[3])
        data.append( (nid,start,end,sal) )

# Find minimum year.

first = min(fiscalyear(d[1]) for d in data)
last = 2023
headers = ['Year','ID','Dec','Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov']


def plan_a():
    # Write a CSV.

    with open('out.csv','w') as f:
        out = csv.writer(f)
        out.writerow(headers)
        for year in range(first,last+1):
            for nid,start,end,sal in data:
                if fiscalyear(start) <= year <= fiscalyear(end):
                    smo = fiscalmonth(start) if fiscalyear(start)== year else 1
                    emo = fiscalmonth(end) if fiscalyear(end) == year else 12
                    nmo = emo - smo + 1

                    row = [year,nid] + ['']*(smo-1) + [sal/12]*nmo + ['']*(12-emo)
                    out.writerow(row)

    # Read it.

    df = pd.read_csv( "out.csv" )
    print(df)

def plan_b():
    # Create a list.

    out = []
    for year in range(first,last+1):
        for nid,start,end,sal in data:
            if fiscalyear(start) <= year <= fiscalyear(end):
                smo = fiscalmonth(start) if fiscalyear(start)== year else 1
                emo = fiscalmonth(end) if fiscalyear(end) == year else 12
                nmo = emo - smo + 1

                row = [year,nid] + ['']*(smo-1) + [sal/12]*nmo + ['']*(12-emo)
                out.append(row)

    # Make a dataframe.

    df = pd.DataFrame(out, columns=headers)
    print(df)

plan_b()

Output:

     Year  ID          Dec  ...          Sep          Oct          Nov
0    1987  73               ...   471.404267   471.404267   471.404267
1    1988  56               ...  3709.038667  3709.038667  3709.038667
2    1988  73   471.404267  ...   471.404267   471.404267   471.404267
3    1988  94               ...  4433.433492  4433.433492  4433.433492
4    1989  56  3709.038667  ...  3709.038667  3709.038667  3709.038667
..    ...  ..          ...  ...          ...          ...          ...
200  2023  70  6860.585783  ...  6860.585783  6860.585783  6860.585783
201  2023  73   471.404267  ...   471.404267   471.404267   471.404267
202  2023  85     6369.825  ...     6369.825     6369.825     6369.825
203  2023  94  4433.433492  ...  4433.433492  4433.433492  4433.433492
204  2023  96  5331.354233  ...  5331.354233  5331.354233  5331.354233

[205 rows x 14 columns]

The unused plan_a there writes the data to a CSV and then reads the CSV. That would be nice if you needed the CSV for other things.

Upvotes: 0

Related Questions