Reputation: 197

Pulling column values based on conditions

I have the following dataframe

df = pd.DataFrame({
    'Column_1': ['Position', 'Start', 'End', 'Position'], 
    'Original_1': ['Open', 'Barn', 'Grass', 'Bubble'], 
    'Latest_1': ['Shut', 'Horn', 'Date', 'Dinner'], 
    'Column_2': ['Start', 'Position', 'End', 'During'], 
    'Original_2': ['Sky', 'Hold', 'Car', 'House'], 
    'Latest_2': ['Pedal', 'Lap', 'Two', 'Force'], 
    'Column_3': ['Start', 'End', 'Position', 'During'], 
    'Original_3': ['Leave', 'Dog', 'Block', 'Hope'], 
    'Latest_3': ['Sear', 'Crawl', 'Enter', 'Night']
})

For every instance where the word Position is in 'Column_1', 'Column_2', or 'Column_3', I want to capture the associated values in 'Original_1', 'Original_2', 'Original_3' and assign them to the new column named 'Original_Values'.

The following code can accomplish that, but only on a column by column basis.

df['Original_Value1'] = df.loc[df['Column_1'] == 'Position', 'Original_1']
df['Original_Value2'] = df.loc[df['Column_2'] == 'Position', 'Original_2']
df['Original_Value3'] = df.loc[df['Column_3'] == 'Position', 'Original_3']

Is there a way to recreate the above code so that it iterates over the entire data frame (not by specified columns)?

I'm hoping to create one column ('Original_values') with the following result:

0      Open
1      Hold
2     Block
3    Bubble
Name: Original_Values, dtype: object

Upvotes: 1

Answers (3)

Anton vBR

Reputation: 18916

How about creating a mask with the first 3 cols (or specify the name of them) and multiply it with the values in cols 6 to 9 (or specify the names of them). Then take max() value to remove nan.

df['Original_Values'] = ((df.iloc[:,:3] == 'Position') * df.iloc[:,6:9].values).max(1)

print(df['Original_values'])

Returns:

0      Open
1      Hold
2     Block
3    Bubble
Name: Original_Value, dtype: object

Upvotes: 2

ALollz

Reputation: 59549

Here's a kinda silly way to do it with some stacking, which might perform better if you have a very large df and need to avoid axis=1.

Stack the first three columns to create a list of the index and which 'Original' column the value corresponds to
Stack the columns from which you want to get the values. Use the above list to reindex it, so you return the appropriate value.
Bring those values back to the original df based on the original row index.

Here's the code:

import re

mask_list = ['Column_1', 'Column_2', 'Column_3']
val_list = ['Original_1', 'Original_2', 'Original_3']

idx = df[mask_list].stack()[df[mask_list].stack() == 'Position'].index.tolist()
idx = [(x , re.sub('(.*_)', 'Original_', y)) for x, y in idx]

df['Original_Values'] = df[val_list].stack().reindex(idx).reset_index(level=1).drop(columns='level_1')

df is now:

   Column_1  Column_2  Column_3 ... Original_Values
0  Position     Start     Start ...            Open
1     Start  Position       End ...            Hold
2       End       End  Position ...           Block
3  Position    During    During ...          Bubble

If 'Position' is not found in any of the columns in mask_list, Original_Values becomes NaN for that row. If you need to scale it to more columns, simply add them to mask_list and val_list.

Upvotes: 0

gbrener

Reputation: 5835

One way to do it, with df.apply():

def choose_orig(row):
    if row['Column_1'] == 'Position':
        return row['Original_1']
    elif row['Column_2'] == 'Position':
        return row['Original_2']
    elif row['Column_3'] == 'Position':
        return row['Original_3']
    return ''

df['Original_Values'] = df.apply(choose_orig, axis=1)

The axis=1 argument to df.apply() causes the choose_orig() function to be called once for each row of the dataframe.

Note that this uses a default value of the empty string, '', when none of the columns match the word 'Position'.

Upvotes: 2

Pulling column values based on conditions

Answers (3)

Related Questions