ascripter
ascripter

Reputation: 6213

pandas string operations: Dynamically select column depending on other columns content

I have a DataFrame with the following structure:

df = pd.DataFrame({
    'variable': ['foo 1 bar', 'foo 2 bar', 'foo 3 bar'],
    'lookup 1': [1, 2, 3],
    'lookup 2': [11, 12, 13],
    'lookup 3': [21, 22, 23],
})

I want to add a new column that selects for each row the lookup <i> column that corresponds to foo <i> bar:

   lookup 1  lookup 2  lookup 3   variable  result
0         1        11        21  foo 1 bar       1
1         2        12        22  foo 2 bar      12
2         3        13        23  foo 3 bar      23

I've been trying to extract i via pandas.Series.str.extract which returns only df.lookup1:

# doesn't work
df.result = df.loc[:, lambda df: "lookup " + df.variable.str.extract("(\d)")]

Also trying to concatenate string with pandas.Series.str.cat doesn't work this way:

# doesn't work either
df.result = df.loc[:, lambda x: pd.Series(['lookup '] * x.shape[0]).str.cat(x.variable.str.extract("(\d)"))]

Upvotes: 1

Views: 350

Answers (3)

ascripter
ascripter

Reputation: 6213

Generalising / expanding @rafaelc's solution a bit I ended up with the following:

# explicitly define which variable value maps to which column
mapper_name = {'foo 1 bar': 'lookup 1',
               'foo 2 bar': 'lookup 2',
               'foo 3 bar': 'lookup 3'}

# map all names to their corresponding column-index
mapper_col_idx = dict(zip(df.columns, np.arange(df.shape[1])))

# apply both mappings to get a Series of column indices, i.e.
# 'foo 1 bar' --maps-to--> 'lookup 1' --maps-to--> 0
col_idx = df.variable.map(mapper_name).map(mapper_col_idx)

# access the dataframe's array via the col_idx array
df['result'] = df.values[np.arange(df.shape[0]), col_idx]

Upvotes: 0

rafaelc
rafaelc

Reputation: 59274

May also use numpy indexing, which might be faster

r, c = df.shape
mapper = dict(zip(df.columns.str.get(-1), 
                  np.arange(c)))

df.values[np.arange(r), 
          df.variable.str.get(4).map(mapper)]

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153460

Try this:

df['result'] = df.lookup(df.index, 
                         ('lookup '+ df.variable.str.extract('(\d+)'))[0].tolist())

Output:

    variable  lookup 1  lookup 2  lookup 3  result
0  foo 1 bar         1        11        21       1
1  foo 2 bar         2        12        22      12
2  foo 3 bar         3        13        23      23

Upvotes: 1

Related Questions