Using duplicates to create different text files

Question

I am struggling to create a text file from another text file.

My text file is:

0.0 99.13 0.11
0.5 19.67 0.59
0.5 22.23 1.22
1.0 9.67  0.08

and I would like to create a text file such as:

0.0 99.13 0.11
0.5 19.67 0.59
1.0 9.67  0.08

or

0.0 99.13 0.11
0.5 22.23 1.22
1.0 9.67  0.08

Generally, every time there would be a duplicate value in the first column of my file I would like to create a file with just one of the duplicates and a value of the chosen line.

My code so far is:

def createFile(file):
    with open(file, 'r') as fh:
        data = fh.read()
    for row in data.splitlines():
        column = row.split()
        print column 
>>> 
['0.0', '99.13', '0.11']
['0.5', '19.67', '0.59']
['0.5', '22.23', '1.22']
['1.0', '9.67', '0.08']

which would let my play with the indexes - maybe checking if column[0] is repeated and then printing the line? or would creating a dictionary be easier?

Cheers, Kate

aldeb · Accepted Answer

If the duplicates are grouped in order, use itertools.groupby:

from itertools import groupby

data = """0.0 99.13 0.11
0.5 19.67 0.59
0.5 22.23 1.22
1.0 9.67  0.08""".split('
')

result = [list(j) for i, j in groupby(data, lambda x: x.split(' ', 1)[0])]

files_num = 0
for e in result:
    files_num = max(files_num, len(e))

for i in range(files_num):
    with open('{}.txt'.format(i), 'w+') as f:
        for line in result:
            min_index = min(i, len(line)-1)
            f.write('{}
'.format(line[min_index]))

0.txt:

0.0 99.13 0.11
0.5 19.67 0.59
1.0 9.67  0.08

1.txt:

0.0 99.13 0.11
0.5 22.23 1.22
1.0 9.67  0.08

Otherwise, if they are not grouped in order, you can use a collections.OrderedDict this way (like 1_CR suggested, but with some changes):

from collections import OrderedDict

data = """0.0 99.13 0.11
0.5 19.67 0.59
1.0 9.67  0.08
0.5 22.23 1.22""".split('
')

d = OrderedDict()
for line in data:
    split = line.split(' ', 1)
    d.setdefault(split[0], []).extend(split[1:])

print(d)

Output:

OrderedDict([ ('0.0', ['99.13 0.11']), 
              ('0.5', ['19.67 0.59', '22.23 1.22']), 
              ('1.0', ['9.67  0.08']) ])

Using duplicates to create different text files

Answers (2)

Related Questions