user1504774
user1504774

Reputation: 127

How to get average for specific list elements

I have the following part input file (has over 500 lines):

L1, a, b, 10, 20, pass,
L1, c, d, 11, 21, pass,
L1, e, f, 12, 22, pass,
L1, a, b, 13, 23, pass,
L1, e, f, 14, 34, pass,

I would like to get the average's of the duplicates i.e. the output as follows:

(where for L1, a, b, 11.5 = (10+13)/2, 21.5 = (20+23)/2)

L1, a, b, 11.5, 21.5
L1, c, d, 11, 21
L1, e, f, 13, 28

My current beginner python code is as follows- still working to tweak it better

 import csv
 from collections import defaultdict
 import numpy as np

 dd = defaultdict(list)
 with open("mean.csv") as input_file:
 for row in csv.reader(input_file):
            dd[tuple(row[:3])].append(float(row[3]))
            dd[tuple(row[:3])].append(float(row[4]))

 for k, v, m in dd.iteritems():
      if len(v) > 1:
           print (' '.join(k), np.mean(v), np.mean(m))

The error I get is:

   Traceback (most recent call last):
   File "average.py", line 11, in <module>
      for k, v, m in dd.iteritems():
   ValueError: need more than 2 values to unpack

Upvotes: 3

Views: 1441

Answers (2)

bmu
bmu

Reputation: 36174

With pandas this would be very short (and it should be fast).

You can do something like this (don't know the meaning or naming of your columns, so it depends what you want to use as the index of your DataFrame):

In [1]: df = pd.read_csv('mean.csv', delimiter=',', header=None)

In [2]: df
Out[2]: 
  X.1 X.2 X.3  X.4  X.5
0  L1   a   b   10   20
1  L1   c   d   11   21
2  L1   e   f   12   22
3  L1   a   b   13   23
4  L1   e   f   14   34

In [3]: df.groupby(['X.1', 'X.2', 'X.3']).mean()
Out[3]: 
              X.4   X.5
X.1 X.2 X.3            
L1   a   b   11.5  21.5
     c   d   11.0  21.0
     e   f   13.0  28.0

Upvotes: 1

Jon Clements
Jon Clements

Reputation: 142106

Untested, but something like this as base can be adapted for the other column... as this just does one at the moment.

import csv
from collections import defaultdict
import numpy as np

dd = defaultdict(list)
with open('in.csv') as fin:
    for row in csv.reader(fin):
        dd[tuple(row[:3])].append(float(row[3]))

for k, v in dd.iteritems():
    if len(v) > 1:
        print ' '.join(k), np.mean(v)

Upvotes: 6

Related Questions