Reputation: 7879
I need to normalize a list of values to fit in a probability distribution, i.e. between 0.0 and 1.0.
I understand how to normalize, but was curious if Python had a function to automate this.
I'd like to go from:
raw = [0.07, 0.14, 0.07]
to
normed = [0.25, 0.50, 0.25]
Upvotes: 64
Views: 191518
Reputation: 2637
Here is a not-terribly-inefficient one liner similar to the top answer (only performs summation once)
norm = (lambda the_sum:[float(i)/the_sum for i in raw])(sum(raw))
A similar method can be done for a list with negative numbers
norm = (lambda the_max, the_min: [(float(i)-the_min)/(the_max-the_min) for i in raw])(max(raw),min(raw))
Upvotes: 1
Reputation: 4543
Use scikit-learn:
from sklearn.preprocessing import MinMaxScaler
data = np.array([1,2,3]).reshape(-1, 1)
scaler = MinMaxScaler()
scaler.fit(data)
print(scaler.transform(data))
Upvotes: 0
Reputation: 2374
For ones who wanna use scikit-learn, you can use
from sklearn.preprocessing import normalize
x = [1,2,3,4]
normalize([x]) # array([[0.18257419, 0.36514837, 0.54772256, 0.73029674]])
normalize([x], norm="l1") # array([[0.1, 0.2, 0.3, 0.4]])
normalize([x], norm="max") # array([[0.25, 0.5 , 0.75, 1.]])
Upvotes: 12
Reputation: 121
If working with data, many times pandas
is the simple key
This particular code will put the raw
into one column, then normalize by column per row. (But we can put it into a row and do it by row per column, too! Just have to change the axis
values where 0 is for row and 1 is for column.)
import pandas as pd
raw = [0.07, 0.14, 0.07]
raw_df = pd.DataFrame(raw)
normed_df = raw_df.div(raw_df.sum(axis=0), axis=1)
normed_df
where normed_df
will display like:
0
0 0.25
1 0.50
2 0.25
and then can keep playing with the data, too!
Upvotes: 2
Reputation: 1964
If you consider using numpy
, you can get a faster solution.
import random, time
import numpy as np
a = random.sample(range(1, 20000), 10000)
since = time.time(); b = [i/sum(a) for i in a]; print(time.time()-since)
# 0.7956490516662598
since = time.time(); c=np.array(a);d=c/sum(a); print(time.time()-since)
# 0.001413106918334961
Upvotes: 4
Reputation: 9714
Use :
norm = [float(i)/sum(raw) for i in raw]
to normalize against the sum to ensure that the sum is always 1.0 (or as close to as possible).
use
norm = [float(i)/max(raw) for i in raw]
to normalize against the maximum
Upvotes: 116
Reputation: 3351
if your list has negative numbers, this is how you would normalize it
a = range(-30,31,5)
norm = [(float(i)-min(a))/(max(a)-min(a)) for i in a]
Upvotes: 19
Reputation: 3246
Try this :
from __future__ import division
raw = [0.07, 0.14, 0.07]
def norm(input_list):
norm_list = list()
if isinstance(input_list, list):
sum_list = sum(input_list)
for value in input_list:
tmp = value /sum_list
norm_list.append(tmp)
return norm_list
print norm(raw)
This will do what you asked. But I will suggest to try Min-Max normalization.
min-max normalization :
def min_max_norm(dataset):
if isinstance(dataset, list):
norm_list = list()
min_value = min(dataset)
max_value = max(dataset)
for value in dataset:
tmp = (value - min_value) / (max_value - min_value)
norm_list.append(tmp)
return norm_list
Upvotes: 3
Reputation: 25053
How long is the list you're going to normalize?
def psum(it):
"This function makes explicit how many calls to sum() are done."
print "Another call!"
return sum(it)
raw = [0.07,0.14,0.07]
print "How many calls to sum()?"
print [ r/psum(raw) for r in raw]
print "\nAnd now?"
s = psum(raw)
print [ r/s for r in raw]
# if one doesn't want auxiliary variables, it can be done inside
# a list comprehension, but in my opinion it's quite Baroque
print "\nAnd now?"
print [ r/s for s in [psum(raw)] for r in raw]
Output
# How many calls to sum()?
# Another call!
# Another call!
# Another call!
# [0.25, 0.5, 0.25]
#
# And now?
# Another call!
# [0.25, 0.5, 0.25]
#
# And now?
# Another call!
# [0.25, 0.5, 0.25]
Upvotes: 7
Reputation: 5534
There isn't any function in the standard library (to my knowledge) that will do it, but there are absolutely modules out there which have such functions. However, its easy enough that you can just write your own function:
def normalize(lst):
s = sum(lst)
return map(lambda x: float(x)/s, lst)
Sample output:
>>> normed = normalize(raw)
>>> normed
[0.25, 0.5, 0.25]
Upvotes: 4