colbyjackson
colbyjackson

Reputation: 175

Sort values from two different arrays

I have found all the values and everything and I tried sort, but it sorts separately rather than together. I want to sort by years, and grades should just follow the values of years. However, when I do sort(years), it would only sort years, leaving grades as it is.

when opening file, it would give me something like:

Year,Grade
2000,84
2001,34
2002,82
2012,74
2008,90

and so forth. So I have calculated average and everything.

years, average_grades = [],[]
avg = []
d = {}

with open(file,'r') as f:
    next(f)
    for line in f:
        year, grade = (s.strip() for s in line.split(','))
        if year in d:
            d[year][0] += int(grade)
            d[year][1] += 1
        else:
            d[year] = [int(grade),1]

    for year, grades in d.items():
        years.append(str(year))
        average_grades.append(float(grades[0]) / grades[1])

    return years, average_grades

Without sort, it would give me similar to this:

2001 74.625
2006 72.241
2012 70.875
2017 69.1981
2005 72.5
2008 71.244
2014 73.318
2004 72.1
2007 72.88
2000 73.1

With years.sort(), it would give me similar to this:
2000 74.625
2001 72.241
2002 70.875
2003 69.1981
2004 72.5
2005 71.244
2006 73.318
2007 72.1

So the sort will only work for years, but won't do that for grades. This problem has been bugging me for so long time now. I am not planning to use pandas.

Upvotes: 0

Views: 90

Answers (4)

Unapiedra
Unapiedra

Reputation: 16197

The alternative solutions are taking the results and zipping them together. As you seem to have control of the reading of the file, I suggest to instead never split the years and grades apart.

In my opinion this is easier than later combining the two lists with zip.

years, average_grades = [],[]
avg = []
d = {}

with open(file,'r') as f:
    next(f)
    for line in f:
        year, grade = (s.strip() for s in line.split(','))
        if year in d:
            d[year][0] += int(grade)
            d[year][1] += 1
        else:
            d[year] = [int(grade),1]

# Iterator-Expression to convert 'd' dictionary into list of tuples.
# Puts (year, average grade) into a new list.
year_grade = [(year, float(grade_tuple[0]) / grade_tuple[1]) \
               for year, grade_tuple in d.items()]

# Sorting is optional, if you return the list of tuples.
# Use 'key=lambda ...' to sort over the year (the first element of the tuple).
# Technically, specyfing the 'key' is not necessary as the default would be
# to sort over the first element first.
year_grade.sort(key=lambda x: x[0])

return year_grade
# Alternatively, return the list of tuples as a list of two tuples: years, grades
return zip(*year_grade)

Other improvements

You can use a defaultdict to avoid the if year in d block:

from collections import defaultdict

d = defaultdict(lambda: [0, 0])

with open(fname,'r') as f:
    next(f)
    for line in f:
        year, grade = (s.strip() for s in line.split(','))
        d[year][0] += int(grade)
        d[year][1] += 1

    def avg(t):
        return float(t[0]) / t[1]
    year_grade = [(y, avg(g)) for y, g in d.items()]
    year_grade.sort()

    return zip(*year_grade)  # Python3: tuple(zip(*year_grade))

Upvotes: 0

Kaushik NP
Kaushik NP

Reputation: 6781

Use zip to join them as a tuple and then sort.

Example :

>>> y = [3, 2, 4, 1, 2]
>>> g = [0.1, 0.4, 0.2, 0.7, 0.1]

>>> mix = list(zip(y,g))
>>> mix
=> [(3, 0.1), (2, 0.4), (4, 0.2), (1, 0.7), (2, 0.1)]

>>> sorted(mix)
=> [(1, 0.7), (2, 0.1), (2, 0.4), (3, 0.1), (4, 0.2)]

#print in your order :

>>> for ele in sorted(mix): 
        print(ele[0],ele[1]) 

1 0.7
2 0.1
2 0.4
3 0.1
4 0.2

Note that for the year 2, there are two values of 0.1 and 0.4 and that it handles it giving more preference to the year and next grades.

Upvotes: 1

Scott Mermelstein
Scott Mermelstein

Reputation: 15397

You want to add this line before the return statement:

years, average_grades = zip(*sorted(zip(years, average_grades), key=lambda p: p[0]))

What does this do?

The inner zip(years, average_grades) tells python to put together each element of the iterables years and average_grades as an array of tuples.

sorted(..., key=lambda p: p[0]) is the sorted utility, except now that it's operating on the pair, it needs to know how to sort the pair. So we pass it a lambda function that says "look at the first part."

The outer zip(*...) takes the results returned from the sorted, which is a list of tuples, and converts it back to two lists. The * tells it to treat the list as a bunch of arguments, so you're passing in pairs to zip. zip takes any number of tuple arguments, and splits them into its component parts. In this case, it's taking the ten pairs and splitting it into 2 tuples of length 10 each.

As long as your iterables are of the same length, this is a "basic" mechanism to sort them together.

Upvotes: 0

zipa
zipa

Reputation: 27869

I hope this example will be helpful, so:

years = [2001,2000,2002]
average_grades = [5,10,15]
result = zip(years,average_grades)
for item in sorted(result, key=lambda x: x[0]):
    print('{} {}'.format(*item))
#2000 10
#2001 5
#2002 15

Upvotes: 0

Related Questions