Function for DataFrame operation using variables in the list with Python

Question

I have a list list = ['OUT', 'IN']where all the elements of the list is a variable name in the data frame with suffixes _3M, _6M, _9M, 15Mattached to it.

List: list = ['OUT', 'IN']

Input_df:

ID OUT_3M OUT_6M OUT_9M OUT_15M IN_3M IN_6M IN_9M IN_15M A 2 3 4 6 2 3 4 6 B 3 3 5 7 3 3 5 7 C 2 3 6 6 2 3 6 6 D 3 3 7 7 3 3 7 7

The problem I am solving to do is subtracting the

1.OUT_6M from OUT_3M and entering in into separate column as Out_3M-6M

2.OUT_9M from OUT_6M and entering in into separate column as Out_6M-9M

3.OUT_15M from OUT_9M and entering in into separate column as Out_9M-15M

The Same repeats to each and every element in the list while keeping the OUT_3M and IN_3M which I mentioned in the sample Output_df dataset.

Output_df:

ID Out_3M Out_3M-6M Out_6M-9M Out_9M-15M IN_3M IN_3M-6M IN_6M-9M IN_9M-15M A 2 1 1 2 2 1 1 2 B 3 0 2 2 3 0 2 2 C 2 1 3 0 2 1 3 0 D 3 0 4 0 3 0 4 0

There are many elements in the list which I need to perform operation on. Is there any way I could solve this by writing a function. Thanks!

Marco Spinaci · Accepted Answer

I'm not sure what you mean by writing a function, aren't a couple of for cycles enough for what you want to do? Something like:

postfixes = ['3M','6M','9M','15M']
prefixes = ['IN','OUT']

# Allocate the space, while also copying _3M
output_df = input_df.copy()

# Rename a few
output_df.rename(columns={'_'.join((prefix, postfixes[i])): '_'.join((prefix, postfixes[i-1] + '-' + postfixes[i]))
                          for prefix in prefixes for i in range(1, len(postfixes))}, inplace=True)


# Compute the differences
for prefix in prefixes:
    for i in range(1,len(postfixes)):
        postfix = postfixes[i] + '-' + postfixes[i-1]
        output_df['_'.join((prefix, postfix))] = input_df['_'.join((prefix, postfixes[i-1]))].values - input_df['_'.join((prefix, postfixes[i]))].values

The output_df is a copy of input_df in the beginning, both to avoid dealing with the _3M case separately, and to pre-allocate the DataFrame instead of creating the columns one at a time (it doesn't matter in your code, but if you had thousands of columns it would waste time moving stuff around in memory otherwise...)

Also, you should avoid calling a list "list" or you're going to get some nasty-to-find bugs along the way when you're trying to convert a tuple to a list!

Function for DataFrame operation using variables in the list with Python

Answers (1)

Related Questions