Rearrange data file

Question

I am trying to reorganize a .txt file containing a list of data with traits in the columns and the family on the rows. Basically, I need to write a program that creates rows comparing the people in each family so that the traits persons 1 and 2, 1 and 3, and 2 and 3 are compared. i.e.:

A 1 2 7 8 9 10
A 1 3 7 9 9 11
etc.

where A is the family, the first 2 numbers are the people compared, the 3rd and 4th numbers are trait1 such as the measurements for each person, and the final numbers are trait2 such as the BMI values for each person.

My input is like this:

A 1 trait trait
A 2 trait trait
A 3 trait trait

I was able to create a data frame using:

data = pandas.read_csv('family.txt.', sep=" ", header = None)
print(data)

I cannot seem to figure out an efficient way to concatenate the data into the rows needed above. Any help is greatly appreciated! Thank you

Srini · Accepted Answer

Ok, Consider your data was as follows

A 1 7 4 5 6
A 2 6 5 4 7
A 3 7 7 5 4
B 1 7 4 5 6
B 2 6 5 4 7
B 3 7 7 5 4

Where the first column is the family and the second column is the person_id and all subsequent columns are traits.

Some super dirty and super hastily written code below seems to give you what you want

file_lines = []
out_list = []
final_out = []

def read_file():
    global file_lines
    with open("sample.txt", 'r') as fd:
        file_lines = fd.read().splitlines()
    print file_lines

def make_output():
    global file_lines, out_list, final_out
    out_line = []
    for line1 in file_lines:
        for line2 in file_lines:
            line1c = line1.split(" ")
            line2c = line2.split(" ")
            if line1c[0] == line2c[0]:
                if line1c[1] >= line2c[1]:
                    continue
                else:
                    out_list = []
                    out_list.append(line1c[0])
                    out_list.append(line1c[1])
                    out_list.append(line2c[1])
                    for i in range(2, len(line1c)):
                        out_list.append(line1c[i])
                        out_list.append(line2c[i])
                print " ".join(out_list)

read_file()
make_output()

The output of print is

A 1 2 7 6 4 5 5 4 6 7
A 1 3 7 7 4 7 5 5 6 4
A 2 1 6 7 5 4 4 5 7 6
A 2 3 6 7 5 7 4 5 7 4
A 3 1 7 7 7 4 5 5 4 6
A 3 2 7 6 7 5 5 4 4 7
B 1 2 7 6 4 5 5 4 6 7
B 1 3 7 7 4 7 5 5 6 4
B 2 1 6 7 5 4 4 5 7 6
B 2 3 6 7 5 7 4 5 7 4
B 3 1 7 7 7 4 5 5 4 6
B 3 2 7 6 7 5 5 4 4 7

As you can see In family A person 1 is compared with 2 and 3. 2 is compared with 1 and 3 and 3 is compared with 1 and 2.

Obviously there will be duplication because each person is compared with every other person in the family twice.

It's trivial to remove this by maintaining a list of who has been compared with whom.

P.S: I know the script is really dirty but I just wanted to illustrate what i've done. Not write production code

EDIT: I wanted to write a slightly more complicated duplicate remover. But since the data is so simple a small modification in the continue criterion solved it. the output after this edit is

A 1 2 7 6 4 5 5 4 6 7
A 1 3 7 7 4 7 5 5 6 4
A 2 3 6 7 5 7 4 5 7 4
B 1 2 7 6 4 5 5 4 6 7
B 1 3 7 7 4 7 5 5 6 4
B 2 3 6 7 5 7 4 5 7 4

which is free of duplicates

Rearrange data file

Answers (1)

Related Questions