Reputation: 31
This is what I have thus far:
Stats2003 = np.loadtxt('/DataFiles/2003.txt')
Stats2004 = np.loadtxt('/DataFiles/2004.txt')
Stats2005 = np.loadtxt('/DataFiles/2005.txt')
Stats2006 = np.loadtxt('/DataFiles/2006.txt')
Stats2007 = np.loadtxt('/DataFiles/2007.txt')
Stats2008 = np.loadtxt('/DataFiles/2008.txt')
Stats2009 = np.loadtxt('/DataFiles/2009.txt')
Stats2010 = np.loadtxt('/DataFiles/2010.txt')
Stats2011 = np.loadtxt('/DataFiles/2011.txt')
Stats2012 = np.loadtxt('/DataFiles/2012.txt')
Stats = Stats2003, Stats2004, Stats2004, Stats2005, Stats2006, Stats2007, Stats2008, Stats2009, Stats2010, Stats2011, Stats2012
I am trying to calculate euclidean distance between each of these arrays with every other array but am having difficulty doing so.
I have the output I would like by calculating the distance like:
dist1 = np.linalg.norm(Stats2003-Stats2004)
dist2 = np.linalg.norm(Stats2003-Stats2005)
dist11 = np.linalg.norm(Stats2004-Stats2005)
etc but I would like to make these calculations with a loop.
I am displaying the calculations into a table using Prettytable.
Can anyone point me in the right direction? I haven't found any previous solutions that have worked.
Upvotes: 3
Views: 2029
Reputation: 362507
To do the loop you will need to keep data out of your variable names. A simple solution would be to use dictionaries instead. The loops are implicit in the dict comprehensions:
import itertools as it
years = range(2003, 2013)
stats = {y: np.loadtxt('/DataFiles/{}.txt'.format(y) for y in years}
dists = {(y1,y2): np.linalg.norm(stats[y1] - stats[y2]) for (y1, y2) in it.combinations(years, 2)}
now access stats for a particular year, e.g. 2007, by stats[2007]
and distances with tuples e.g. dists[(2007, 20011)]
.
Upvotes: 2
Reputation: 13485
Look at scipy.spatial.distance.cdist
.
From the documentation:
Computes distance between each pair of the two collections of inputs.
So you could do something like the following:
import numpy as np
from scipy.spatial.distance import cdist
# start year to stop year
years = range(2003,2013)
# this will yield an n_years X n_features array
features = np.array([np.loadtxt('/Datafiles/%s.txt' % year) for year in years])
# compute the euclidean distance from each year to every other year
distance_matrix = cdist(features,features,metric = 'euclidean')
If you know the start year, and you aren't missing data for any years, then it's easy to determine which two years are being compared at coordinate (m,n)
in the distance matrix.
Upvotes: 2