Reputation: 730
I have a pandas dataframe that has a column with the name of a column from another pandas dataframe. So for example:
a
var1
var3
var2
...
Then another dataframe that looks like this:
var1 var2 var3 ...
5 8 9 ...
9 3 2 ...
...
The result I would want is a numpy array where the rows correspond to the rows of the first, and the columns refer to the time series from the second based on the column name in the first.
In this example here I'd expect the result to be:
5 9 ...
9 2 ...
8 3 ...
...
I know I can do .loc like this:
a.loc[a == "var1","new_col"] = df2["var1"]
b.loc[a == "var2","new_col"] = df2["var2"]
b.loc[a == "var3","new_col"] = df2["var3"]
...
or something like that. I know that out of the box wouldn't work due to needing to expand but even if it did or if an np.where was able to work this way I want to avoid that because there are a LOT of variables here and I'm trying to see if I can do a map-style approach to this. Thanks in advance.
Upvotes: 1
Views: 1510
Reputation: 35676
df3 = df2.T.reindex(df1['a'])
df3
:
0 1
a
var1 5 9
var3 9 2
var2 8 3
import pandas as pd
df1 = pd.DataFrame({'a': ['var1', 'var3', 'var2']})
df2 = pd.DataFrame({'var1': [5, 9], 'var2': [8, 3], 'var3': [9, 2]})
df3 = df2.T.reindex(df1['a'])
print(df3)
With repetitions:
import pandas as pd
df1 = pd.DataFrame({'a': ['var1', 'var3', 'var2', 'var3', 'var1']})
df2 = pd.DataFrame({'var1': [5, 9], 'var2': [8, 3], 'var3': [9, 2]})
df3 = df2.T.reindex(df1['a'])
print(df3)
df3
:
0 1
a
var1 5 9
var3 9 2
var2 8 3
var3 9 2
var1 5 9
Upvotes: 1
Reputation: 13447
According to your last comment you can have repetitons in columns a
on your first dataframe. A possible approach could be the following:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({"a": ["var1", "var3", "var2", "var1"]})
df2 = pd.DataFrame(np.arange(12).reshape(4,3),
columns=["var1", "var2", "var3"])
df2
diz = {}
for col in df2.columns:
diz[col] = df2[col].values
mat = np.array(df1["a"].map(diz).tolist())
Upvotes: 0
Reputation: 571
Try this:
df1 = pd.DataFrame({'a':['var1','var3','var2']})
df2 = pd.DataFrame({'var1':[5,9],'var2':[8,3],'var3':[9,2]})
new_df = df2[df1['a']].T
print(new_df)
Output:
0 1
var1 5 9
var3 9 2
var2 8 3
Upvotes: 0
Reputation: 23156
Assuming df1
contains the order of the rows and df2
is the data, you can do:
>>> df2[df1["a"].tolist()].T
0 1
var1 5 9
var3 9 2
var2 8 3
Upvotes: 2