Excalibur
Excalibur

Reputation: 67

How to set a default value for a specific column in which other columns interact with Pandas?

I have a DataFrame that creates a CSV with this function:

def create_data(date, place, value):
    can_spend = 190

    try:
        file = open(filename, 'r+')

        data_set = pd.read_csv(filename, index_col=False)

        frame = pd.DataFrame(data_set, columns=['Left', 'Date', 'Place', 'Spent'])
        frame = frame.append({"Left": can_spend, "Date": date, "Place": place, "Spent": value}, ignore_index=True)
        frame['Date'] = pd.to_datetime(frame['Date'])
        frame['Week'] = frame['Date'].dt.weekofyear

        # write the data-set to the csv
        frame.to_csv(filename, index=None, header=True)
    except IOError:
        file = open(filename, "w")

        frame = pd.DataFrame(columns=['Left', 'Date', 'Place', 'Spent'])
        frame.to_csv(filename, index=None, header=True)

This DataFrame is going to be storing a small portion of my personal budget. I have a set spending limit that I want each each entry in the frame to subtract from based on the week (the spending limit will reset each week).

Here is how I add data to the DataFrame:

def create_new_entry(self):
    get_date = input("Date: ")
    get_place = input("Place: ")
    get_amount = float(input("Amount: "))
    create_data(get_date, get_place, get_amount)


Here is how I would like the DataFrame to look:

"Left" column will default to the value of 190 each week

     Left        Date    Place  Spent  Week
0  146.69  2019-01-02  Walmart  43.31     1
1  92.46   2019-01-05  Kroger   54.23     1
2  72.46   2019-01-06  Kroger   20.00     1
# Here is where "Left" will reset on new week
3  170.00  2019-01-08  Kroger   20.00     2

How can I accomplish this?

Upvotes: 2

Views: 857

Answers (1)

Valentino
Valentino

Reputation: 7361

This can be done with groupby and cumsum with a single line of code. Do not add the 'Left' column while reading and creating the dataframe (I mean, you can, but it will be overwritten anyway).

Suppose then that, after reading and first manipulation to create the useful 'Week' column, your df is:

         Date    Place  Spent  Week
0  2019-01-02  Walmart  43.31     1
1  2019-01-05   Kroger  54.23     1
2  2019-01-06   Kroger  20.00     1
3  2019-01-08  Walmart  20.00     2
4  2019-01-09  Walmart  30.00     2
5  2019-01-10   Kroger  10.00     2

Then you can create the 'Left' column like:

can_spend = 190

df['Left'] = df.groupby('Week').apply(lambda x : can_spend - x['Spent'].cumsum()).reset_index(drop=True)

And df will become:

         Date    Place  Spent  Week    Left
0  2019-01-02  Walmart  43.31     1  146.69
1  2019-01-05   Kroger  54.23     1   92.46
2  2019-01-06   Kroger  20.00     1   72.46
3  2019-01-08  Walmart  20.00     2  170.00
4  2019-01-09  Walmart  30.00     2  140.00
5  2019-01-10   Kroger  10.00     2  130.00

A brief explanation: groupby creates subsets of the dataframe, grouping rows with the same value in column 'Week'. The apply method do the vectorized calculation to get the remaining amount for each subset (week). reset_index(drop=True) is needed otherwise the index builtd by groupby will not match with the index of df, raising an error.

Upvotes: 2

Related Questions