Frank
Frank

Reputation: 61

Dynamically Generating Variables for Dataframe values Assignment

I am trying to make the below code more dynamic. I have left out all code that is not relevant for simplicity. What follows is the whole concept for the program. The objective is to build linear models from random columns. The way it working is x numbers of random columns are selection and then those columns are used to build a linear model. That model is used on a test dataset and relevant information is captured in a dataframe. This will continues a large number of times. What I would like to do is to be able to generation the code that is used to assign the values to the dataframe dynamical based on the number of columns selected. Otherwise I need to keep the number of columns selected static. With a consequence that I babysit the program while it runs and I manual index the number of columns selected.

The following code is what I would like to generate dynamically: test_df.loc[i,asignment_list[ii]] = i.

The example code below is only calling for 3 random columns. import pandas as pd

test_df = pd.DataFrame(columns = {'a','b','c','d','e','f','g','h','i','j'})


for i in range(10):
    asignment_list = list(test_df.sample(n = 3, replace = True, axis = 1))

    test_df.loc[i,asignment_list[0]] = i
    test_df.loc[i,asignment_list[1]] = i
    test_df.loc[i,asignment_list[2]] = i

print(test_df)

Output:

enter image description here

I did trying the below piece of code but it requires that I call the variable name which can't be done dynamically.

for ii in range(0,3,1):
    globals()[f'test_df.loc[{i},asignment_list[{ii}]'] = i

If python does not have this functionality could I build it into python with C?

Upvotes: 1

Views: 122

Answers (2)

Frank
Frank

Reputation: 61

Sorry for any confusion and thanks for the questions and responses. I found a solution. If anyone finds another way I would like to see it because this answer was a stretch for me. It is below with a note on application. The solution uses the zip function that is feed into the dict() function and then into a Pandas Dataframe. After the first iteration it concatenated the previous Pandas Dataframe from the last iteration with the newly constructed Pandas Dataframe. The newly concatenated Pandas Dataframe is assigned to the previous Pandas Dataframe variable name.

Use Application Comment: Given a wide table of data and a desire to build a model from the data set it would be optimal to select a random set of columns for each model that is built. This will prevent over fitting and show which variables have the model consistent effect.

import pandas as pd
import random

test_df = pd.DataFrame(columns = {'a','b','c','f','j'})


for i in range(0,10,1):
    columns_number_selected = len(test_df.columns)
    Randum_Columns = random.randrange(columns_number_selected)
    while Randum_Columns == 0:
        Randum_Columns = random.randrange(columns_number_selected)
        
    assignment_list = list(test_df.sample(n = columns_number_selected, replace = True, axis = 1))
    data = dict(zip(assignment_list, [i]))
    
    if i == 0:
        answer1 = pd.DataFrame(data, index = [i])
    
    else:
        answer2 = pd.DataFrame(data, index = [i])
        answer1 = pd.concat([answer1, answer2])
        
print(answer1) 

Example of a possible output:

enter image description here

Upvotes: 0

sitting_duck
sitting_duck

Reputation: 3720

Is this what you are looking for?

tgt_cols = list('abcdef')
dfs = pd.DataFrame(index=range(10))

for c in tgt_cols:
    r = [random.randint(0,9) for _ in range(3)]
    s = pd.Series(r, index=r).drop_duplicates()
    dfs[c] = s
    
print(dfs)

Result

     a    b    c    d    e    f
0  0.0  0.0  0.0  NaN  NaN  NaN
1  NaN  NaN  NaN  NaN  1.0  NaN
2  NaN  NaN  NaN  NaN  NaN  NaN
3  NaN  3.0  NaN  NaN  NaN  NaN
4  NaN  NaN  NaN  NaN  NaN  4.0
5  NaN  NaN  NaN  NaN  NaN  5.0
6  6.0  NaN  6.0  NaN  NaN  NaN
7  NaN  NaN  NaN  7.0  7.0  NaN
8  NaN  8.0  8.0  NaN  8.0  NaN
9  9.0  NaN  NaN  9.0  NaN  NaN

Upvotes: 1

Related Questions