Reputation: 41
I have a code that reads a column in a CSV file with 3 columns: Zone, Offnet calls, and Traffic.
Sample data:
Zone Offnet calls Traffic
zone0 0 0
zone1 421 30167
zone2 667 23172
zone3 12146 215033
zone4 7163 126514
zone5 5211 130045
zone6 1374 75357
zone7 3702 257846
zone8 1129 77677
zone9 2679 79331
I need "Offnet calls" and "Traffic" to create a list. For example, row 2 will be [421, 30167] and make a search of best match/closest value from the list which contains lists of same parameters. It will more clear by seeing the code:
tp_usp15 = [10, 200]
tp_usp23 = [15, 250]
tp_usp27 = [20, 300]
list_usp = [tp_usp15,tp_usp23, tp_usp27]
tp_bsnspls_s = [1,30]
tp_bsnspls_steel = [13,250]
tp_bsnspls_chrome = [18,350]
list_bsnspls = [tp_bsnspls_s,tp_bsnspls_steel,tp_bsnspls_chrome]
tp_bsnsrshn10 = [10,200]
tp_bsnsrshn15 = [15,300]
tp_bsnsrshn20 = [20,400]
list_bsnsrshn = [tp_bsnsrshn10,tp_bsnsrshn15,tp_bsnsrshn20]
common_list = list_usp + list_bsnspls + list_bsnsrshn
For example, from this list provided in the code, the closest value/best match for row 2 = [421, 30167] is [20, 400] = tp_bsnsrshn20. And I need a code that will make the same operation for all the values in the CSV file. The closest value/best match needs to be recorded to the next column (a new column called "Best match" should be created right next to the "traffic" column). I have a code that works for inputs. 2 user inputs create a list and the search is done from the list of lists.
client_traffic = int(input("Enter the expected monthly traffic: "))
client_offnet = int(input("Enter monthly offnet calls: "))
list_client = [client_payment, client_offnet]
from functools import partial
def distance_squared(x, y):
return (x[0] - y[0])**2 + (x[1] - y[1])**2
best_match_overall = min(common_list, key=partial(distance_squared, list_client))
name_best_match_overall = [k for k,v in locals().items() if v == best_match_overall][0]
How to apply this code to work for the whole CSV file. By the way, it gives also the name of value. I guess it should not be hard for advanced users to create some loop that will work by the same concept that I have provided in the last code but for the whole file. I am really struggling at this point. Thanks in advance, guys!
Upvotes: 1
Views: 138
Reputation: 120509
Input data:
>>> df
Zone Offnet calls Traffic
0 zone0 0 0
1 zone1 421 30167
2 zone2 667 23172
3 zone3 12146 215033
4 zone4 7163 126514
5 zone5 5211 130045
6 zone6 1374 75357
7 zone7 3702 257846
8 zone8 1129 77677
9 zone9 2679 79331
Build your reference list as dataframe:
ref = {'tp_usp15': [10, 200],
'tp_usp23': [15, 250],
'tp_usp27': [20, 300],
'tp_bsnspls_s': [1, 30],
'tp_bsnspls_steel': [13, 250],
'tp_bsnspls_chrome': [18, 350],
'tp_bsnsrshn10': [10, 200],
'tp_bsnsrshn15': [15, 300],
'tp_bsnsrshn20': [20, 400]}
df1 = pd.DataFrame(ref, index=['crit1', 'crit2']).T.rename_axis('Name')
df1['Best Match'] = list(map(list, df1.values))
>>> df1
crit1 crit2 Best Match
Name
tp_usp15 10 200 [10, 200]
tp_usp23 15 250 [15, 250]
tp_usp27 20 300 [20, 300]
tp_bsnspls_s 1 30 [1, 30]
tp_bsnspls_steel 13 250 [13, 250]
tp_bsnspls_chrome 18 350 [18, 350]
tp_bsnsrshn10 10 200 [10, 200]
tp_bsnsrshn15 15 300 [15, 300]
tp_bsnsrshn20 20 400 [20, 400]
Creates the cartesian product from df
and df1
and compute the squared distance:
cx = pd.merge(df.reset_index(), df1.reset_index(), how='cross')
x0, x1, y0, y1 = cx[['Offnet calls', 'Traffic', 'crit1', 'crit2']].values.T
cx['distance'] = (x0 - y0)**2 + (x1 - y1)**2
Keep the closest value of each df
row:
cols = ['index', 'Zone', 'Offnet calls', 'Traffic', 'Best Match', 'Name']
out = cx.loc[cx.groupby('index')['distance'].idxmin(), cols] \
.set_index('index').rename_axis(None)
Output result:
>>> out
Zone Offnet calls Traffic Best Match Name
0 zone0 0 0 [1, 30] tp_bsnspls_s
1 zone1 421 30167 [20, 400] tp_bsnsrshn20
2 zone2 667 23172 [20, 400] tp_bsnsrshn20
3 zone3 12146 215033 [20, 400] tp_bsnsrshn20
4 zone4 7163 126514 [20, 400] tp_bsnsrshn20
5 zone5 5211 130045 [20, 400] tp_bsnsrshn20
6 zone6 1374 75357 [20, 400] tp_bsnsrshn20
7 zone7 3702 257846 [20, 400] tp_bsnsrshn20
8 zone8 1129 77677 [20, 400] tp_bsnsrshn20
9 zone9 2679 79331 [20, 400] tp_bsnsrshn20
Upvotes: 1