Reputation: 109
Following are the standard values for score and grades. Score column contains numbers within specified range.
Score | Grade1 | Score2 | Grade3 | Grade4 |
---|---|---|---|---|
<260 | A1 | <200 | D1 | ID1 |
260-275 | A2 | 200-250 | D2 | ID2 |
275-280 | A3 | 250-260 | D3 | ID3 |
280-285 | A4 | 260-270 | D4 | ID4 |
>=285 | A5 | >=270 | D5 | ID5 |
Input :-
Score | Grade1 | Score2 | Grade3 | Grade4 |
---|---|---|---|---|
290 | A1 | 265 | D3 | ID2 |
Output:-
Tiers | Grade | Grade1 | Score2 | Grade3 | Grade4 |
---|---|---|---|---|---|
1 | A1 | ||||
2 | ID2 | ||||
3 | D3 | ||||
4 | 265 | ||||
5 | 290 |
Input :-
Score | Grade1 | Score2 | Grade3 | Grade4 |
---|---|---|---|---|
290 | A1 | 265 | D3 | ID2 |
Upvotes: 0
Views: 174
Reputation: 260500
Assuming df
and df_input
your two DataFrames, you can use a combination of pandas and numpy:
# get df_input as Series
s = df_input.loc[0]
# find identical values
mask = df.eq(s)
# match ranges
def match_range(val, ref):
# extract the upper value, for the last replace by np.inf
s = pd.to_numeric(ref.str.extract('[<-](\d+)', expand=False)).fillna(np.inf)
# define default output as False
out = np.zeros(len(ref), dtype=bool)
# find matching position
out[np.searchsorted(s, val)] = True
return out
# apply match_range on columns for which a direct match failed
m = ~mask.any()
mask.loc[:, m] = df.loc[:, m].apply(lambda x: match_range(s[x.name], x))
# generate output
out = (pd.DataFrame(np.where(mask, s, ''),
index=np.arange(len(df))+1,
columns=df.columns)
.rename_axis('Tiers').reset_index()
)
Output:
Tiers Score Grade1 Score2 Grade3 Grade4
0 1 A1
1 2 ID2
2 3 D3
3 4 265
4 5 290
Used inputs:
df = pd.DataFrame({'Score': ['<260', '260-275', '275-280', '280-285', '>=285'],
'Grade1': ['A1', 'A2', 'A3', 'A4', 'A5'],
'Score2': ['<200', '200-250', '250-260', '260-270', '>=270'],
'Grade3': ['D1', 'D2', 'D3', 'D4', 'D5'],
'Grade4': ['ID1', 'ID2', 'ID3', 'ID4', 'ID5']})
df_input = pd.DataFrame({'Score': [290], 'Grade1': ['A1'], 'Score2': [265], 'Grade3': ['D3'], 'Grade4': ['ID2']})
Upvotes: 1
Reputation: 4803
I did, maybe someone will make it shorter. Will work if df2 has one row.Difficulty in creating a range for searching. If it is possible to specify two values from and to at once, then it can be made easier.
import pandas as pd
import numpy as np
hist = len(df1) - 1
df1['Score'] = df1['Score'].str.replace("<|>|=", "", regex=True).str.split('-')
df1['Score2'] = df1['Score2'].str.replace("<|>|=", "", regex=True).str.split('-')
df1.loc[0, ['Score', 'Score2']] = [[0, 0], [df1.loc[0, 'Score'][0], df1.loc[0, 'Score2'][0]]]
df1.loc[hist, ['Score', 'Score2']] = [[df1.loc[hist, 'Score'][0], df1.loc[hist, 'Score2'][0]], [np.inf, np.inf]]
ind = (df1['Score'].str[0].astype(float) <= df2['Score'].values[0]) & \
(df1['Score'].str[1].astype(float) >= df2['Score'].values[0])
ind2 = (df1['Score2'].str[0].astype(float) <= df2['Score2'].values[0]) & \
(df1['Score2'].str[1].astype(float) >= df2['Score2'].values[0])
df1 = df1[df1.eq(df2.iloc[0])]
df1.loc[df1[ind].index, 'Score'] = df2['Score'].values
df1.loc[df1[ind2].index, 'Score2'] = df2['Score2'].values
df1.insert(0, 'Tiers', range(1, hist + 2))
df1 = df1.fillna('')#if NaN values are needed instead of an empty string, comment out this line
print(df1)
Output
Tiers Score Grade1 Score2 Grade3 Grade4
0 1 A1
1 2 ID2
2 3 D3
3 4 265
4 5 290
Upvotes: 0