Python: for loop iteration through objects, not strings

Question

I used pd.GetDummies to one hot encode my categorical variables to be used as predictors. For some of my columns that had many unique values, I have many new columns and I am trying to find a fast way to create interaction terms for these. (I only want interactions for a subset of my columns, so PolynomialFeatures() won't work...or will it?)

Here is what I am trying to do:

Step 1: Create lists of column names for each of the subset I want to multiply:

channel = [col for col in df if col.startswith('channel')]
quote = [col for col in df if col.startswith('quote')]

print(channel[:1])
Out: 'channel_A'
     'channel_B'

Step 2: for loop:

cols = 'channel quote'.split()
for col in cols:
    for i in col:
        colname = 'value_X_'+i
        df[colname] = df['value_days']*df[i]+0

The problem is that the inner loop does not recognize col as an object: it recognizes it as a string (error = 'c', evidenced by:

for col in cols:
    for i in col:
        print i

Out[1]: 
c
h
.
.
.
o
t
e

Goal: My desired outcome is to get a new column that is named for the two columns were originally multiplied and has values for the multiplication.

For example, the first element in channel is channel_A, so I want to get a new column named value_X_channel_A and it should have values that are equivalent to the product of value_days*channel_A.

value_days | channel_A | value_X_channel_A
-------------------------------------------
5          |5          |25

This works perfectly fine if I just run the inner loop and replace col with channel.

How can I get this to work?

Thanks in advance.

ICW · Accepted Answer

Your question is worded in a way that is hard to understand (for me at least). If I'm right about what you want, you wish to multiply each column with a name starting with "channel" or "quote" by the column "value_days" stored in your df, and then store that in a new column named value_X_{i} where {i} is the name of the column that was multiplied. You're close, but you're code is awkward. Use another data structure (Dictionary) to make the code straightforward and readable:

d = { 
    'quote' : [col for col in df if col.startswith('quote')],
    'channel' : [col for col in df if col.startswith('channel')]
}

for columns_string, columns in d.items():
    for col_string in columns:
        colname = 'value_X_'+col_string
        df[colname] = df['value_days'] * df[i] + 0

Explanation:

d = ... - Creates a dictionary with two key value pairs 'quote' and 'channel' with values equal to a list of the desired column names.

for column_string, columns in d.items(): - .items() returns an iterator to a dictionaries key/value pairs, we then loop through this naming each key 'column_string' and the column-names-list is stored in the variable 'columns'.

You can quickly realize that something is wrong with your code by noticing that you create variables channel and quote and set them to there corresponding values, but you never actually use either of those lists in your code.

Python: for loop iteration through objects, not strings

Answers (2)

Related Questions