Henry Nguyen
Henry Nguyen

Reputation: 29

Python to evaluate duplicate elements in csv files

I have 2 csv files:

csv 1:

CHANNEL
3
3
4
1
2
1
4
5

csv 2:

CHANNEL
1
2
2
3
4
4
4
5

I want to evaluate the state of the channel by finding the duplicate channels. If number of channel > 1, the state is 0, else the state is 1.

output csv:

index  channel 1  channel 2  channel 3  channel 4  channel 5
  1        0          1         0          0           0
  2        1          0         1          0           1

So far I have counted the duplicate channels but for 1 file only. Now I don't know how to read 2 csv files and create the output file.

import csv
import collections

with open("csvfile.csv") as f:
    csv_data = csv.reader(f,delimiter=",")
    next(csv_data)
    count = collections.Counter()
    for row in csv_data:
        channel = row[0]
        count[channel] += 1
    for channel, nb in count.items():
        if nb>1:

Upvotes: 0

Views: 72

Answers (2)

Mike67
Mike67

Reputation: 11342

You can read each file into a list then check the channel counts of each list.

Try this code:

ss1 = '''
CHANNEL
3
3
4
1
2
1
4
5
'''.strip()

ss2 = '''
CHANNEL
1
2
2
3
4
4
4
5
'''.strip()


with open("csvfile1.csv",'w') as f: f.write(ss1)  # write test file 1
with open("csvfile2.csv",'w') as f: f.write(ss2)  # write test file 2

#############################

with open("csvfile1.csv") as f:
   lines1 = f.readlines()[1:]  # skip header
   lines1 = [int(x) for x in lines1] # convert to ints
   
with open("csvfile2.csv") as f:
   lines2 = f.readlines()[1:]  # skip header
   lines2 = [int(x) for x in lines2] # convert to ints

lines = [lines1,lines2] # make list for iteration

state = [[0]*5,[0]*5]  # default zero for each state

for ci in [0,1]: # each file 
   for ch in range(5):  # each channel
      state[ci][ch] = 0 if lines[ci].count(ch+1) > 1 else 1 # check channel count, set state

# write to terminal
print('Index','Channel 1','Channel 2','Channel 3','Channel 4','Channel 5', sep = '  ')
print('  ',1,'     ','          '.join(str(c) for c in state[0]))
print('  ',2,'     ','          '.join(str(c) for c in state[1]))
    
# write to csv
with open('state.csv','w') as f:
   f.write('Index,Channel 1,Channel 2,Channel 3,Channel 4,Channel 5\n')
   f.write('1,' + ','.join(str(c) for c in state[0]) + '\n')
   f.write('2,' + ','.join(str(c) for c in state[1]) + '\n')

Output (terminal)

Index  Channel 1  Channel 2  Channel 3  Channel 4  Channel 5
   1       0          1          0          0          1
   2       1          0          1          0          1

Output (state.csv)

Index,Channel 1,Channel 2,Channel 3,Channel 4,Channel 5
1,0,1,0,0,1
2,1,0,1,0,1

Upvotes: 1

meTchaikovsky
meTchaikovsky

Reputation: 7666

You can use collections.Counter

from collections import Counter

# read the two files 
with open('file_0.csv','r') as source:
    zero = source.readlines()
with open('file_1.csv','r') as source:
    one = source.readlines()

# convert to integers 
# if the last item is not '\n', you only need [1:]
zero = [int(item) for item in zero[1:-1]]
one = [int(item) for item in one[1:-1]]
# combine two lists
zero += one 
# count the values with counter
channels_counts = Counter(zero)

unique_channels = sorted(set(channels_counts.keys()))
res = [0 if channels_counts[item] > 1 else 1 for item in unique_channels]

for ind,item in enumerate(res):
    print('channel %i' % ind,item)

Upvotes: 1

Related Questions