Reputation: 155
I have a table with top 3 reasons (Table 1) and another table with the category it belongs to for each variable (Table 2). I am trying to match the category bins into the reason table like in table 3.
Table 1: top 3 reasons, basically which 3 variable was the most important for that Register number
RegNo Reason1 Reason2 Reason3
1111 v3 v2 v6
2222 v2 v3 v5
3333 v3 v2 v6
4444 v3 v6 v2
5555 v3 v2 v5
Table 2: The category bin for each variable
RegNo v2 v3 v4 v5 v6
1111 (0.44, 0.64] (0.0, 60.0] missing (-inf, 3.0] (102.0, 175.0]
2222 (-inf, 0.33] (0.0, 60.0] welldone missing missing
3333 (0.44, 0.64] (0.0, 60.0] rare missing missing
4444 (0.44, 0.64] (0.0, 60.0] missing missing (20.0, 102.0]
5555 (0.64, inf] (0.0, 60.0] missing missing (-inf, 20.0]
Table 3: For each ID, find category bin label and replace reason1,2 & 3 with the labels
RegNo Reason1 Reason2 Reason3
1111 (0.0, 60.0] (0.44, 0.64] (102.0, 175.0]
2222 (-inf, 0.33] (0.0, 60.0] missing
3333 (0.0, 60.0] (0.44, 0.64] missing
4444 (0.0, 60.0] (20.0, 102.0] (0.44, 0.64]
5555 (0.0, 60.0] (0.64, inf] missing
Upvotes: 0
Views: 38
Reputation: 31226
Approach
join()
pd.concat()
of each of the reasons from output of joindf1 = pd.read_csv(io.StringIO(""" RegNo Reason1 Reason2 Reason3
1111 v3 v2 v6
2222 v2 v3 v5
3333 v3 v2 v6
4444 v3 v6 v2
5555 v3 v2 v5
"""), sep="\s+")
df2 = pd.read_csv(io.StringIO(""" RegNo v2 v3 v4 v5 v6
1111 (0.44, 0.64] (0.0, 60.0] missing (-inf, 3.0] (102.0, 175.0]
2222 (-inf, 0.33] (0.0, 60.0] welldone missing missing
3333 (0.44, 0.64] (0.0, 60.0] rare missing missing
4444 (0.44, 0.64] (0.0, 60.0] missing missing (20.0, 102.0]
5555 (0.64, inf] (0.0, 60.0] missing missing (-inf, 20.0]
"""),sep="\s\s+", engine="python")
# index main df
df1 = df1.set_index("RegNo")
# reshape and index reasons
dfm = df2.set_index("RegNo").stack()
df3 = pd.concat([
# add apprpriate column into index, don't want othert columns
(df1.set_index(c, append="True").loc[:,[]]
# now it's a straight forward join
.join(dfm.rename_axis(["RegNo",c]).to_frame())
# cleanup index and rename columns of joined DF
.droplevel(1).rename(columns={0:c}))
for c in ["Reason1","Reason2","Reason3"]], axis=1)
RegNo | Reason1 | Reason2 | Reason3 |
---|---|---|---|
1111 | (0.0, 60.0] | (0.44, 0.64] | (102.0, 175.0] |
2222 | (-inf, 0.33] | (0.0, 60.0] | missing |
3333 | (0.0, 60.0] | (0.44, 0.64] | missing |
4444 | (0.0, 60.0] | (20.0, 102.0] | (0.44, 0.64] |
5555 | (0.0, 60.0] | (0.64, inf] | missing |
Upvotes: 1