user9875189
user9875189

Reputation: 309

keep only rows of one dataframe whose values for certain columns are also presented in another dataframe

I have two dataframes that share several common columns as follows :

the first one df1:

    TG_ID   ORG_NO  TG_CAP  DATA_DATE   METER_ID    CONS_SORT   WIRING_MODE T_FACTOR    PHASE_FLAG  I1  ... I87 I88 I89 I90 I91 I92 I93 I94 I95 I96
0   1646176 134010902   315.0   20200801    8200000036483255    06  3   60.0    1   0.606333    ... 0.480667    0.524333    0.399333    0.430667    0.386667    0.396667    0.371333    0.314333    0.336333    0.289000
1   1646176 134010902   315.0   20200801    8200000036483255    06  3   60.0    2   0.606333    ... 0.480667    0.524333    0.399333    0.430667    0.386667    0.396667    0.371333    0.314333    0.336333    0.289000
2   1646176 134010902   315.0   20200801    8200000036483255    06  3   60.0    3   0.606333    ... 0.480667    0.524333    0.399333    0.430667    0.386667    0.396667    0.371333    0.314333    0.336333    0.289000
6   1646178 134010902   200.0   20200801    8200000033404961    06  3   60.0    1   0.338333    ... 0.199333    0.215000    0.191667    0.248333    0.154000    0.240000    0.129333    0.224667    0.152000    0.198000
7   1646178 134010902   200.0   20200801    8200000033404961    06  3   60.0    2   0.338333    ... 0.199333    0.215000    0.191667    0.248333    0.154000    0.240000    0.129333    0.224667    0.152000    0.198000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1117    900028797849    134010902   400.0   20200801    8200000035893592    06  3   120.0   2   0.286667    ... 0.192000    0.196667    0.203667    0.171000    0.140333    0.154333    0.126000    0.144000    0.144667    0.131333
1118    900028797849    134010902   400.0   20200801    8200000035893592    06  3   120.0   3   0.286667    ... 0.192000    0.196667    0.203667    0.171000    0.140333    0.154333    0.126000    0.144000    0.144667    0.131333
1119    900029405841    134010902   100.0   20200801    8200000036482415    06  3   30.0    1   0.060000    ... 0.088000    0.127667    0.084333    0.084333    0.123333    0.088000    0.084000    0.084333    0.084333    0.083667
1120    900029405841    134010902   100.0   20200801    8200000036482415    06  3   30.0    2   0.060000    ... 0.088000    0.127667    0.084333    0.084333    0.123333    0.088000    0.084000    0.084333    0.084333    0.083667
1121    900029405841    134010902   100.0   20200801    8200000036482415    06  3   30.0    3   0.060000    ... 0.088000    0.127667    0.084333    0.084333    0.123333    0.088000    0.084000    0.084333    0.084333    0.083667

and the second one df2:


    TG_ID   ORG_NO  DATA_DATE   METER_ID    CONS_SORT   WIRING_MODE PHASE_FLAG  U1  U2  U3  ... U87 U88 U89 U90 U91 U92 U93 U94 U95 U96
27  1646176 134010901   20200801    8200000034727355    04  1   1   248.7   247.4   248.8   ... 251.0   251.4   251.5   246.6   248.1   247.8   248.0   247.8   248.0   246.8
28  1646176 134010902   20200801    8200000032252506    05  1   1   248.5   247.8   248.9   ... 251.0   251.5   249.3   246.8   248.2   247.6   247.2   247.9   249.2   246.9
30  1646176 134010902   20200801    8200000031675334    05  1   1   248.5   248.3   250.4   ... 253.0   251.9   248.4   249.1   248.5   247.6   247.6   246.9   245.9   247.5
32  1646176 134010902   20200801    8200000032855108    05  1   1   240.9   241.1   242.8   ... 248.8   248.8   246.1   245.9   246.7   247.0   245.2   246.9   244.4   243.6
34  1646176 134010902   20200801    8200000032861235    05  1   1   241.8   242.7   243.9   ... 249.8   249.5   246.4   246.2   246.9   247.0   245.7   247.6   244.7   244.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
28323   900028797849    134010902   20200801    8200000032252390    05  1   1   229.6   231.3   231.2   ... 236.0   234.4   232.5   232.7   231.8   233.3   233.1   232.4   229.5   229.3
28324   900028797849    134010902   20200801    8200000035146454    05  1   1   233.2   232.5   233.1   ... 236.8   237.3   237.5   233.8   234.2   233.4   233.6   232.3   233.3   233.7
28325   900028797849    134010902   20200801    8200000035113338    05  1   1   233.1   232.6   233.0   ... 237.0   237.2   234.7   233.7   234.3   233.4   233.6   232.2   232.7   233.8
28326   900028797849    134010902   20200801    8200000035132418    05  1   1   233.7   233.4   233.8   ... 237.3   237.7   234.9   234.3   234.4   233.9   233.6   232.0   233.2   233.9
28327   900028797849    134010902   20200801    8200000035138508    05  1   1   231.9   233.3   233.4   ... 237.7   236.1   234.3   234.1   233.5   233.8   233.6   233.8   231.3   231.1

I want to keep rows in df2 whose values in columns "TG_ID", "ORG_NO" and "DATA_DATE" are also present in df1. For example, 27-th row of df2 has value "1646176", "134010901" and "20200801" for columns "TG_ID", "ORG_NO" and "DATA_DATE", these values are not all present in df1(since df1 only has value "134010902" for the column "ORG_NO"), so 27-th row of df2 should be filtered out. On the other hand, 28-th row of df2 should be kept since value "1646176", "134010902" and "20200801" are also presented in df1.

I´ve tried merge like following :

df3 = df2.merge(df1, on = ["TG_ID", "ORG_NO", "DATA_DATE"])

but it gives more rows and columns than df2, which is not expected.

And the following call to join() simply gives the error

len(left_on) must equal the number of levels in the index of "right"

df3 = df2.join(df1, on = ["TG_ID", "ORG_NO", "DATA_DATE"])

How can i achieve my requirement properly?

Upvotes: 1

Views: 770

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

You can set the columns TG_ID, ORG_NO and DATA_DATE as a multiindex in both the dataframes df1 and df2 then using MultiIndex.isin test the membership for index of df2 in the index of dataframe df1 to create a boolean mask:

c =  ["TG_ID", "ORG_NO", "DATA_DATE"]
mask = df2.set_index(c).index.isin(df1.set_index(c).index)

>>> df2[mask]

              TG_ID     ORG_NO DATA_DATE          METER_ID CONS_SORT WIRING_MODE PHASE_FLAG     U1     U2     U3  ...    U87    U88    U89    U90    U91    U92    U93    U94    U95    U96
28          1646176  134010902  20200801  8200000032252506        05           1          1  248.5  247.8  248.9  ...  251.0  251.5  249.3  246.8  248.2  247.6  247.2  247.9  249.2  246.9
30          1646176  134010902  20200801  8200000031675334        05           1          1  248.5  248.3  250.4  ...  253.0  251.9  248.4  249.1  248.5  247.6  247.6  246.9  245.9  247.5
32          1646176  134010902  20200801  8200000032855108        05           1          1  240.9  241.1  242.8  ...  248.8  248.8  246.1  245.9  246.7  247.0  245.2  246.9  244.4  243.6
34          1646176  134010902  20200801  8200000032861235        05           1          1  241.8  242.7  243.9  ...  249.8  249.5  246.4  246.2  246.9  247.0  245.7  247.6  244.7  244.0
...             ...        ...       ...               ...       ...         ...        ...    ...    ...    ...  ...    ...    ...    ...    ...    ...    ...    ...    ...    ...    ...
28323  900028797849  134010902  20200801  8200000032252390        05           1          1  229.6  231.3  231.2  ...  236.0  234.4  232.5  232.7  231.8  233.3  233.1  232.4  229.5  229.3
28324  900028797849  134010902  20200801  8200000035146454        05           1          1  233.2  232.5  233.1  ...  236.8  237.3  237.5  233.8  234.2  233.4  233.6  232.3  233.3  233.7
28325  900028797849  134010902  20200801  8200000035113338        05           1          1  233.1  232.6  233.0  ...  237.0  237.2  234.7  233.7  234.3  233.4  233.6  232.2  232.7  233.8
28326  900028797849  134010902  20200801  8200000035132418        05           1          1  233.7  233.4  233.8  ...  237.3  237.7  234.9  234.3  234.4  233.9  233.6  232.0  233.2  233.9
28327  900028797849  134010902  20200801  8200000035138508        05           1          1  231.9  233.3  233.4  ...  237.7  236.1  234.3  234.1  233.5  233.8  233.6  233.8  231.3  231.1

Upvotes: 2

Related Questions