Reputation: 21
I have data in a file in this format:
+1 1:4 2:11 3:3 4:11 5:1 6:13 7:4 8:2 9:2 10:13
-1 1:2 2:7 3:4 4:12 5:3 6:4 7:3 8:12 9:2 10:12
+1 1:4 2:6 3:3 4:2 5:3 6:5 7:4 8:2 9:3 10:6
and so on....
where the numbers on the left of the colon is an 'index' and numbers on the right of the colon are integers that describe a certain attribute. For each line, if the number on the right of the colon is the same for the same index on another line, I want to store the total amount of +1's and -1's in two separate variables. This is my code so far:
for i in lines:
for word in i:
if word.find(':')!=-1:
att = word.split(':', 1)[-1]
idx = word.split(':', 1)[0]
for j in lines:
clas = j.split(' ', 1)[0]
if word.find(':')!=-1:
if idx ==word.split(':', 1)[0]:
if att ==word.split(':', 1)[0]:
if clas>0:
ifattandyes = ifattandyes+1
else:
ifattandno = ifattandno+1
My problem is att and idx do not seem to update as I think word.find(':') just finds the first instance of a colon and runs with it. Can anyone help?
EDIT:
The above explanation has been confusing. I'm a bit stubborn about how the count of 1s and -1s is acquired. As each pair on each line is read, I want to search through the data for the number of +1s and -1s that the pair is involved in and store them into 2 separate variables. The reason for doing so is to calculate probabilities of each pair leading to a +1 or -1.
Upvotes: 0
Views: 104
Reputation: 77454
Your first error is in the second line:
for word in i:
this iterates over each character.
You meant to use:
for word in i.split():
Upvotes: 0
Reputation: 59090
Here is a suggestion (provided I understand the question correctly) :
#!/bin/env python
from collections import defaultdict
positives=defaultdict(int)
negatives=defaultdict(int)
for line in open('data'):
theclass = line[0:2] == '+1'
for pair in line[2:].split():
positives[pair]+=theclass
negatives[pair]+=not theclass
for key in positives.keys():
print key, "\t+1:", positives[key], "\t-1:", negatives[key]
Applied to the following data:
$ cat data
+1 1:4 2:11 3:3 4:11 5:1 6:13 7:4 8:2 9:2 10:13
-1 1:2 2:7 3:4 4:12 5:3 6:4 7:3 8:12 9:2 10:12
+1 1:4 2:6 3:3 4:2 5:3 6:5 7:4 8:2 9:3 10:6
it gives:
$ python t.py
9:2 +1: 1 -1: 1
9:3 +1: 1 -1: 0
8:2 +1: 2 -1: 0
10:6 +1: 1 -1: 0
6:13 +1: 1 -1: 0
10:13 +1: 1 -1: 0
10:12 +1: 0 -1: 1
2:7 +1: 0 -1: 1
2:6 +1: 1 -1: 0
4:11 +1: 1 -1: 0
4:12 +1: 0 -1: 1
4:2 +1: 1 -1: 0
1:2 +1: 0 -1: 1
1:4 +1: 2 -1: 0
3:3 +1: 2 -1: 0
5:1 +1: 1 -1: 0
3:4 +1: 0 -1: 1
5:3 +1: 1 -1: 1
8:12 +1: 0 -1: 1
7:4 +1: 2 -1: 0
7:3 +1: 0 -1: 1
2:11 +1: 1 -1: 0
6:5 +1: 1 -1: 0
6:4 +1: 0 -1: 1
Upvotes: 3
Reputation: 353039
I'll make this community wiki because it's too close (in spirit, anyway) to an answer already posted, but it has a few advantages:
from collections import Counter
with open("datafile.dat") as fp:
counts = {}
for line in fp:
parts = line.split()
sign, keys = parts[0], parts[1:]
counts.setdefault(sign, Counter()).update(keys)
all_keys = set().union(*counts.values())
for key in sorted(all_keys):
print '{:8}'.format(key),
print ' '.join('{}: {}'.format(c, counts[c].get(key, 0)) for c in counts)
which produces
10:12 +1: 0 -1: 1
10:13 +1: 1 -1: 0
10:6 +1: 1 -1: 0
1:2 +1: 0 -1: 1
1:4 +1: 2 -1: 0
[etc.]
Note that nowhere is there any reference to +1
or -1
; sign
can really be anything.
Upvotes: 0
Reputation: 676
I'm not sure if I've got this or not.
tot_up = {}; tot_dn = {}
for line in input_file:
parts = line.split() # split on whitespace
up_or_down = parts[0]
parts = parts[1:]
if up_or_down == '-1':
store = tot_dn
else:
store = tot_up
for part in parts:
store[part] = store.get(part, 0) + 1
print "Total +1s: ", sum(tot_up.values())
print "Total -1s: ", sum(tot_dn.values())
What this does not do, but could be done easily enough, is strip out the att:val pairs where no match was found.
But I'm not sure I've understood your requirements properly.
Upvotes: 1