zachvac
zachvac

Reputation: 730

Pandas/Numpy Map Column of column names to values

I have a pandas dataframe that has a column with the name of a column from another pandas dataframe. So for example:

a
var1
var3
var2
...

Then another dataframe that looks like this:

var1   var2   var3   ...
5      8      9      ...
9      3      2      ...
...

The result I would want is a numpy array where the rows correspond to the rows of the first, and the columns refer to the time series from the second based on the column name in the first.

In this example here I'd expect the result to be:

5  9  ...
9  2  ...
8  3  ...
...

I know I can do .loc like this:

a.loc[a == "var1","new_col"] = df2["var1"]
b.loc[a == "var2","new_col"] = df2["var2"]
b.loc[a == "var3","new_col"] = df2["var3"]
...

or something like that. I know that out of the box wouldn't work due to needing to expand but even if it did or if an np.where was able to work this way I want to avoid that because there are a LOT of variables here and I'm trying to see if I can do a map-style approach to this. Thanks in advance.

Upvotes: 1

Views: 1510

Answers (4)

Henry Ecker
Henry Ecker

Reputation: 35676

Use T + reindex:

df3 = df2.T.reindex(df1['a'])

df3:

      0  1
a         
var1  5  9
var3  9  2
var2  8  3
import pandas as pd

df1 = pd.DataFrame({'a': ['var1', 'var3', 'var2']})
df2 = pd.DataFrame({'var1': [5, 9], 'var2': [8, 3], 'var3': [9, 2]})

df3 = df2.T.reindex(df1['a'])

print(df3)

With repetitions:

import pandas as pd

df1 = pd.DataFrame({'a': ['var1', 'var3', 'var2', 'var3', 'var1']})
df2 = pd.DataFrame({'var1': [5, 9], 'var2': [8, 3], 'var3': [9, 2]})

df3 = df2.T.reindex(df1['a'])

print(df3)

df3:

      0  1
a         
var1  5  9
var3  9  2
var2  8  3
var3  9  2
var1  5  9

Upvotes: 1

rpanai
rpanai

Reputation: 13447

According to your last comment you can have repetitons in columns a on your first dataframe. A possible approach could be the following:

Sample Data

import pandas as pd
import numpy as np

df1 = pd.DataFrame({"a": ["var1", "var3", "var2", "var1"]})
df2 = pd.DataFrame(np.arange(12).reshape(4,3),
                   columns=["var1", "var2", "var3"])

Extract dictionary from df2

diz = {}
for col in df2.columns:
    diz[col] = df2[col].values

Use Map and transform to array

mat = np.array(df1["a"].map(diz).tolist())

Upvotes: 0

Crystal L
Crystal L

Reputation: 571

Try this:

df1 = pd.DataFrame({'a':['var1','var3','var2']})
df2 = pd.DataFrame({'var1':[5,9],'var2':[8,3],'var3':[9,2]})

new_df = df2[df1['a']].T
print(new_df)

Output:

        0   1
var1    5   9
var3    9   2
var2    8   3

Upvotes: 0

not_speshal
not_speshal

Reputation: 23156

Assuming df1 contains the order of the rows and df2 is the data, you can do:

>>> df2[df1["a"].tolist()].T
      0  1
var1  5  9
var3  9  2
var2  8  3

Upvotes: 2

Related Questions