Lasith Niroshan
Lasith Niroshan

Reputation: 1083

Pandas Data Processing

I have two data frames as follow.

df1 :

A B C D E F
1 2 3 x y z

df2 :

A B C G H I
4 5 6 p q r

I wanted to make three data frames using these two like this.

ndf1 :

A B C ID
1 2 3 id_0
4 5 6 id_1

ndf2 :

ID D E F
id_0 x y z

ndf3 :

ID G H I
id_1 p q r

My questions are, Is this workflow possible with Python and Pandas? and What would be the best approach for that?

Upvotes: 0

Views: 137

Answers (2)

Pierre D
Pierre D

Reputation: 26221

I will differ a bit from your formulation, in order to:

  1. follow the convention of Unix/Linux's comm command.
  2. use the index of df1, resp. df2, to indicate where the rows come from (instead of creating id_0 etc.) If necessary, the index of df1 and df2 can be set to uniquely indicate provenance.

With this, you can do:

ix1 = df1.columns.difference(df2.columns)  # unique to df1
ix2 = df2.columns.difference(df1.columns)  # unique to df2
ix3 = df1.columns.intersection(df2.columns)  # common

u1 = df1[ix1]
u2 = df2[ix2]
common = pd.concat([df1[ix3], df2[ix3]])

On your example data:

>>> u1
   D  E  F
0  x  y  z

>>> u2
   G  H  I
0  p  q  r

>>> common
   A  B  C
0  1  2  3
0  4  5  6

Addendum

Here is an example of how to modify the index of df1 and df2 to indicate unambiguously provenance:

index1 = [f'df1_{i}' for i in df1.index]
index2 = [f'df2_{i}' for i in df2.index]

u1 = df1.set_axis(index1)[ix1]
u2 = df2.set_axis(index2)[ix2]
common = pd.concat([df1.set_axis(index1)[ix3], df2.set_axis(index2)[ix3]])

And now:

>>> u1
       D  E  F
df1_0  x  y  z

>>> u2
       G  H  I
df2_0  p  q  r

>>> common
       A  B  C
df1_0  1  2  3
df2_0  4  5  6

Upvotes: 1

p.habermanm
p.habermanm

Reputation: 85

You could use pd.DataFrame.merge to merge df1 and df2. Then simply drop the unwanted columns from merged df to create ndf1

To create the ndf2 and ndf3 just drop the unwanted A, B, and C columns from df1 and df2 using pd.DataFrame.drop

FYI

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html

Upvotes: 0

Related Questions