MTT
MTT

Reputation: 5263

convert data table format in python

Let say I have a table in python in the following format:

id1 V1 100
id1 V2 200
id1 V3 0
id2 V1 300
id2 V2 400
id2 V3 1   

and I want to convert in to this format:

id1 100 200 0
id2 300 400 1

V1, V2 and V3 are variable names and I do not need to keep them. I just need ids and the values of each variable. Number of variables (Vs) is also unknown but the order of variables is always same. I'm looking for a smarter way to convert rather than looping. Is there any built-in function?

Upvotes: 2

Views: 617

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180481

There is no other builtin method but you can group using a dict:

from collections import defaultdict

d  = defaultdict(list)

s = """id1 V1 100
id1 V2 200
id1 V3 0
id2 V1 30
id2 V2 400
id2 V3 1"""

for a, _, c in map(str.split, s.splitlines()):
        d[a].append(c)

print(["{},{}".format(k,",".join(v)) for k,v in d.items()]
['id2,30,400,1', 'id1,100,200,0']

If you want the output aligned nicely, you can do something like this

if you want to maintain order, use an OrderedDict:

from collections import OrderedDict

d  = OrderedDict()

s = """id1 V1 100
id1 V2 200
id1 V3 0
id2 V1 30
id2 V2 400
id2 V3 1"""

for a, _, c in map(str.split, s.splitlines()):
        d.setdefault(a, []).append(c)

If it is in a file, you can parse it with the csv lib:

import  csv
from collections import OrderedDict   
d  = OrderedDict()

with open("in.txt") as f:
    for k, _, v in csv.reader(f, delimiter=" "):
        d.setdefault(k, []).append(v)

The OrderedDict output:

OrderedDict([('id1', ['100', '200', '0']), ('id2', ['30', '400', '1'])])

Upvotes: 3

Related Questions