Reputation: 835
I have 2 pandas dataframes (df1, df2) that I am trying to extract data from and create a 3rd dataframe (df3)
df1 has 2 columns (an id column and another column that hold the name of the columns in the 2nd dataframe (df2)
df1 looks like:
===============
id1 name
--- ----
1 df2_column1_name
5 df2_column1_name
33 df2_column3_name
...
... and so on
df2 looks like:
===============
id2 df2_column1_name df2_column2_name df2_column2_name .... and so on
--- ---------------- ---------------- ----------------
12 Jimmy male 25 ....
16 Becky female 30 ....
75 Mike male 80 ....
....
.... and so on
I am trying to create df3 to look like:
=======================================
column1 Column2 Column3
------- ------- -------
1 12 Jimmy
5 12 male
33 12 25
.
.
1 16 Becky
5 16 female
33 16 30
.
.
1 75 Mike
5 75 male
33 75 80
.
.
.
The dataframes can be quite large. I am trying to figure out the most efficient way to do this without double looping if possible. please advise best way to do this. Thank you
Upvotes: 1
Views: 43
Reputation: 375415
stack and a merge gets you most of the way there:
In [11]: df2.set_index("id2").stack().reset_index(name='value')
Out[11]:
id2 level_1 value
0 12 df2_column1_name Jimmy
1 12 df2_column2_name male
2 12 df2_column3_name 25
3 16 df2_column1_name Becky
4 16 df2_column2_name female
5 16 df2_column3_name 30
6 75 df2_column1_name Mike
7 75 df2_column2_name male
8 75 df2_column3_name 80
In [12]: df2.set_index("id2").stack().reset_index(name='value').merge(df1, right_on="name", left_on="level_1")
Out[12]:
id2 level_1 value id1 name
0 12 df2_column1_name Jimmy 1 df2_column1_name
1 16 df2_column1_name Becky 1 df2_column1_name
2 75 df2_column1_name Mike 1 df2_column1_name
3 12 df2_column2_name male 5 df2_column2_name
4 16 df2_column2_name female 5 df2_column2_name
5 75 df2_column2_name male 5 df2_column2_name
6 12 df2_column3_name 25 33 df2_column3_name
7 16 df2_column3_name 30 33 df2_column3_name
8 75 df2_column3_name 80 33 df2_column3_name
Lastly you have to select only the columns you want and sort:
In [13]: df2.set_index("id2").stack().reset_index(name='value').merge(df1, right_on="name", left_on="level_1")[["id1", "id2", "value"]].sort_v
...: alues("id2")
Out[13]:
id1 id2 value
0 1 12 Jimmy
3 5 12 male
6 33 12 25
1 1 16 Becky
4 5 16 female
7 33 16 30
2 1 75 Mike
5 5 75 male
8 33 75 80
Upvotes: 1