Reputation: 77
The function needs to be able to check a file for duplicates in each row and column.
Example of file with duplicates:
A B C
A A B
B C A
As you can see, there is a duplicate in row 2 with 2 A's but also in Column 1 with two A's. code:
def duplication_char(dc):
with open (dc,"r") as duplicatechars:
linecheck = duplicatechar.readlines()
linecheck = [line.split() for line in linecheck]
for row in linecheck:
if len(set(row)) != len(row):
print ("duplicates", " ".join(row))
for column in zip(*checkLine):
if len(set(column)) != len(column):
print ("duplicates"," ".join(column))
Upvotes: 0
Views: 190
Reputation: 103744
You can have a List of Lists and use zip
to transpose it.
Given your example, try:
from collections import Counter
with open(fn) as fin:
data=[line.split() for line in fin]
rowdups={}
coldups={}
for d, m in ((rowdups, data), (coldups, zip(*data))):
for i, sl in enumerate(m):
count=Counter(sl)
for c in count.most_common():
if c[1]>1:
d.setdefault(i, []).append(c)
>>> rowdups
{1: [('A', 2)]}
>>> coldups
{0: [('A', 2)]}
Upvotes: 1
Reputation: 15423
Well, here is how I would do it.
First, read your files and create a 2d numpy array with the content:
import numpy
with open('test.txt', 'r') as fil:
lines = fil.readlines()
lines = [line.strip().split() for line in lines]
arr = numpy.array(lines)
Then, check if each row has duplicates using sets (a set has no duplicates, so if the length of the set is different than the length of the array, the array has duplicates):
for row in arr:
if len(set(row)) != len(row):
print 'Duplicates in row: ', row
Then, check if each column has duplicates using sets, by transposing your numpy array:
for col in arr.T:
if len(set(col)) != len(col):
print 'Duplicates in column: ', col
If you wrap all of this in a function:
def check_for_duplicates(filename):
import numpy
with open(filename, 'r') as fil:
lines = fil.readlines()
lines = [line.strip().split() for line in lines]
arr = numpy.array(lines)
for row in arr:
if len(set(row)) != len(row):
print 'Duplicates in row: ', row
for col in arr.T:
if len(set(col)) != len(col):
print 'Duplicates in column: ', col
As suggested by Apero, you can also do this without numpy using zip (https://docs.python.org/3/library/functions.html#zip):
def check_for_duplicates(filename):
with open(filename, 'r') as fil:
lines = fil.readlines()
lines = [line.strip().split() for line in lines]
for row in lines:
if len(set(row)) != len(row):
print 'Duplicates in row: ', row
for col in zip(*lines):
if len(set(col)) != len(col):
print 'Duplicates in column: ', col
In your example, this code prints:
# Duplicates in row: ['A' 'A' 'B']
# Duplicates in column: ['A' 'A' 'B']
Upvotes: 4