Reputation: 6213
I have a DataFrame with the following structure:
df = pd.DataFrame({
'variable': ['foo 1 bar', 'foo 2 bar', 'foo 3 bar'],
'lookup 1': [1, 2, 3],
'lookup 2': [11, 12, 13],
'lookup 3': [21, 22, 23],
})
I want to add a new column that selects for each row the lookup <i>
column that corresponds to foo <i> bar
:
lookup 1 lookup 2 lookup 3 variable result
0 1 11 21 foo 1 bar 1
1 2 12 22 foo 2 bar 12
2 3 13 23 foo 3 bar 23
I've been trying to extract i
via pandas.Series.str.extract
which returns only df.lookup1
:
# doesn't work
df.result = df.loc[:, lambda df: "lookup " + df.variable.str.extract("(\d)")]
Also trying to concatenate string with pandas.Series.str.cat
doesn't work this way:
# doesn't work either
df.result = df.loc[:, lambda x: pd.Series(['lookup '] * x.shape[0]).str.cat(x.variable.str.extract("(\d)"))]
Upvotes: 1
Views: 350
Reputation: 6213
Generalising / expanding @rafaelc's solution a bit I ended up with the following:
# explicitly define which variable value maps to which column
mapper_name = {'foo 1 bar': 'lookup 1',
'foo 2 bar': 'lookup 2',
'foo 3 bar': 'lookup 3'}
# map all names to their corresponding column-index
mapper_col_idx = dict(zip(df.columns, np.arange(df.shape[1])))
# apply both mappings to get a Series of column indices, i.e.
# 'foo 1 bar' --maps-to--> 'lookup 1' --maps-to--> 0
col_idx = df.variable.map(mapper_name).map(mapper_col_idx)
# access the dataframe's array via the col_idx array
df['result'] = df.values[np.arange(df.shape[0]), col_idx]
Upvotes: 0
Reputation: 59274
May also use numpy
indexing, which might be faster
r, c = df.shape
mapper = dict(zip(df.columns.str.get(-1),
np.arange(c)))
df.values[np.arange(r),
df.variable.str.get(4).map(mapper)]
Upvotes: 2
Reputation: 153460
Try this:
df['result'] = df.lookup(df.index,
('lookup '+ df.variable.str.extract('(\d+)'))[0].tolist())
Output:
variable lookup 1 lookup 2 lookup 3 result
0 foo 1 bar 1 11 21 1
1 foo 2 bar 2 12 22 12
2 foo 3 bar 3 13 23 23
Upvotes: 1