Reputation: 7832
Let us say i have a tuple of strings as follows:
tos = ('12|edr4r\tedward\t21\n',
'1|edr4r\tedward\t21\n',
'3|edr4r\tedward\t21\n',
'8|edr4r\tedward\t21\n',
'10|edr4r\tedward\t21\n',
'2|edr4r\tedward\t21\n')
Where the format for each element in the tuple is:
'integer_number|id\tname\tage\n'
and each element in the tuple contains the same information, in this case,
'edr4r\tedward\t21\n'
and a map list that tells (over) which elements to compute the averages over the integer_number
s of tos
.
map_lst = [0,0,1,2,1,0]
i.e., one average will be over tos[0]
, tos[1]
and tos[5]
(since 0
appears in positions 0, 1 and 5 of map_lst
), the other average will be over tos[2]
and tos[4]
, and finally one over tos[3]
.
I'd like to compute the averages of the numbers before '|'
in an avgs_list
that contain the averages, and (only) some of the information in each element of tos
:
avgs_list = ['edr4r\tedward\t(12+1+2)/3\n',
'edr4r\tedward\t(3+10)/2\n',
'edr4r\tedward\t8\n']
Is there any pythonic way to do this. I am looking for a solution as generic as possible without hardcoding the number of indexes, etc.
I could do some for looping over the list, store and then compute averages but I thought there may be a more pythonic way to do it, using the map
function or something else...
Upvotes: 0
Views: 101
Reputation: 4701
How is this?
def average(tos, map_lst):
"""
given
tos: a sequence of N|user\tname\tAGE\n
map_lst: a list with positions corresponding to those in tos, and values
indicating which group each tos element will be averaged with.
return the groups of averages as a list of user\tname\tAVG\n
"""
# get the leading nums
nums = [s.partition('|')[0] for s in tos]
# group them into lists that will be averaged together (based on the map)
avg_groups = [[] for i in set(map_lst)]
for i,n in zip(map_lst, nums):
avg_groups[i].append(n)
# generate the averages
def fmt(tup):
mid = tos[0].partition('|')[2].rpartition('\t')[0] # user\tname
if len(tup) > 1:
avg = '({0})/{1}'.format('+'.join(tup), len(tup))
else:
avg = str(tup[0])
return "{0}\t{1}\n".format(mid, avg)
return [fmt(l) for l in avg_groups]
Test:
tos = ('12|edr4r\tedward\t21\n','1|edr4r\tedward\t21\n','3|edr4r\tedward\t21\n','8|edr4r\tedward\t21\n','10|edr4r\tedward\t21\n','2|edr4r\tedward\t21\n')
map_lst = [0,0,1,2,1,0]
print(average(tos,map_lst))
>> ['edr4r\tedward\t(12+1+2)/3\n', 'edr4r\tedward\t(3+10)/2\n', 'edr4r\tedward\t8\n']
Upvotes: 1
Reputation: 15730
You could use pandas:
from pandas import *
import re
data = [re.split(r'\t|\|', x) for x in tos]
data = DataFrame(data)
data[3] = data[3].str.rstrip('\n')
data[0] = data[0].astype(int)
data[4] = map_lst
data.groupby([1,2,3,4])[0].mean()
Out[1]:
1 2 3 4
edr4r edward 21 0 5.0
1 6.5
2 8.0
Name: 0, dtype: float64
Upvotes: 1
Reputation: 122126
To actually calculate the averages of the leading integer, you could use something like:
averages = []
for n in range(max(map_lst) + 1): # however many averages needed
averages.append(sum(int(v.split("|")[0]) # get int from v
for i, v in enumerate(tos) # index and value
if map_lst[i] == n) # whether to use this v
/ float(map_lst.count(i))) # divide by number of ints
For your data, this gives
averages == [5.0, 6.5, 8.0]
I am a little confused by your output format, which seems to include the calculation to carry out but not the answer. I think you should focus less on using strings in your code; parse them at the start, create them at the end, but use other data structures in-between.
Upvotes: 1