Swetha
Swetha

Reputation: 109

Convert DataFrame of options with 5*5 grid of choice

Following are the standard values for score and grades. Score column contains numbers within specified range.

Score Grade1 Score2 Grade3 Grade4
<260 A1 <200 D1 ID1
260-275 A2 200-250 D2 ID2
275-280 A3 250-260 D3 ID3
280-285 A4 260-270 D4 ID4
>=285 A5 >=270 D5 ID5

Input :-

Score Grade1 Score2 Grade3 Grade4
290 A1 265 D3 ID2

Output:-

Tiers Grade Grade1 Score2 Grade3 Grade4
1 A1
2 ID2
3 D3
4 265
5 290

Input :-

Score Grade1 Score2 Grade3 Grade4
290 A1 265 D3 ID2

Upvotes: 0

Views: 174

Answers (2)

mozway
mozway

Reputation: 260500

Assuming df and df_input your two DataFrames, you can use a combination of and :

# get df_input as Series
s = df_input.loc[0]

# find identical values
mask = df.eq(s)

# match ranges
def match_range(val, ref):
    # extract the upper value, for the last replace by np.inf
    s = pd.to_numeric(ref.str.extract('[<-](\d+)', expand=False)).fillna(np.inf)
    # define default output as False
    out = np.zeros(len(ref), dtype=bool)
    # find matching position
    out[np.searchsorted(s, val)] = True
    return out

# apply match_range on columns for which a direct match failed
m = ~mask.any()
mask.loc[:, m] = df.loc[:, m].apply(lambda x: match_range(s[x.name], x))

# generate output
out = (pd.DataFrame(np.where(mask, s, ''),
                    index=np.arange(len(df))+1,
                    columns=df.columns)
         .rename_axis('Tiers').reset_index()
      )

Output:

   Tiers Score Grade1 Score2 Grade3 Grade4
0      1           A1                     
1      2                               ID2
2      3                         D3       
3      4                 265              
4      5   290                            

Used inputs:

df = pd.DataFrame({'Score': ['<260', '260-275', '275-280', '280-285', '>=285'],
                   'Grade1': ['A1', 'A2', 'A3', 'A4', 'A5'],
                   'Score2': ['<200', '200-250', '250-260', '260-270', '>=270'],
                   'Grade3': ['D1', 'D2', 'D3', 'D4', 'D5'],
                   'Grade4': ['ID1', 'ID2', 'ID3', 'ID4', 'ID5']})

df_input = pd.DataFrame({'Score': [290], 'Grade1': ['A1'], 'Score2': [265], 'Grade3': ['D3'], 'Grade4': ['ID2']})

Upvotes: 1

inquirer
inquirer

Reputation: 4803

I did, maybe someone will make it shorter. Will work if df2 has one row.Difficulty in creating a range for searching. If it is possible to specify two values from and to at once, then it can be made easier.

import pandas as pd
import numpy as np

hist = len(df1) - 1

df1['Score'] = df1['Score'].str.replace("<|>|=", "", regex=True).str.split('-')
df1['Score2'] = df1['Score2'].str.replace("<|>|=", "", regex=True).str.split('-')

df1.loc[0, ['Score', 'Score2']] = [[0, 0], [df1.loc[0, 'Score'][0], df1.loc[0, 'Score2'][0]]]
df1.loc[hist, ['Score', 'Score2']] = [[df1.loc[hist, 'Score'][0], df1.loc[hist, 'Score2'][0]], [np.inf, np.inf]]


ind = (df1['Score'].str[0].astype(float) <= df2['Score'].values[0]) & \
      (df1['Score'].str[1].astype(float) >= df2['Score'].values[0])

ind2 = (df1['Score2'].str[0].astype(float) <= df2['Score2'].values[0]) & \
      (df1['Score2'].str[1].astype(float) >= df2['Score2'].values[0])

df1 = df1[df1.eq(df2.iloc[0])]

df1.loc[df1[ind].index, 'Score'] = df2['Score'].values
df1.loc[df1[ind2].index, 'Score2'] = df2['Score2'].values

df1.insert(0, 'Tiers', range(1, hist + 2))

df1 = df1.fillna('')#if NaN values are needed instead of an empty string, comment out this line

print(df1)

Output

   Tiers Score Grade1 Score2 Grade3 Grade4
0      1           A1                     
1      2                               ID2
2      3                         D3       
3      4                 265              
4      5   290                   

Upvotes: 0

Related Questions