Python - csv column restructuring

Question

I'm totally new to python.

I have a csv file. Its structure is:

Ip, Tag, Sentence_id, scheme
#1, yes,      1,        1
#1, yes,      2,        2 
#2, no,       1,        1
#3  maybe,    3,        3

There are 100 sentence_id, a number of ips and 6 schemes (1-6).

Is there a way with python to restructure the csv so that it's output is a matrix of the following structure:

Ip,   scheme 1, scheme 2, scheme 3, sentence_id
#1      yes,      NA,          NA,      1
#1      NA,       yes,         NA,      2
#2      no,       NA,          NA,      1
#3      NA,       NA,         maybe,    3

I'm not sure if python is the "right" language to use for something like this. I've been guided to either python or awk, but have no idea of either. Thanks.

Jon Clements · Accepted Answer

You can use csv.DictReader to load the file into a list of dictionaries, then from that, find the maximum scheme value to build your output field names. For each row in the list, set the scheme N field to be equal to the value of the Tag column. Then we use csv.DictWriter to fill any missing values with NA where a key is not present, eg:

import csv

with open('input.csv', 'rb') as fin, open('output.csv', 'wb') as fout:
    rows = list(csv.DictReader(fin, skipinitialspace=True))
    schemes = range(1, max(int(row['scheme']) for row in rows) + 1)
    fieldnames = ['Ip'] + ['scheme {}'.format(i) for i in schemes] + ['Sentence_id']
    csvout = csv.DictWriter(fout, fieldnames=fieldnames, extrasaction='ignore', restval='NA')
    csvout.writeheader()
    for row in rows:
        row['scheme {}'.format(row['scheme'].strip())] = row['Tag']
        csvout.writerow(row)

This gives the following output given your example input:

Ip,scheme 1,scheme 2,scheme 3,Sentence_id
#1,yes,NA,NA,1
#1,NA,yes,NA,2
#2,no,NA,NA,1
#3,NA,NA,maybe,3

Python - csv column restructuring

Answers (2)

Related Questions