Building a dictionary of words from multiple lists in python

Question

I have a list of dictionaries of 100 points as follows:

datapoint1 a:1 b:2 c:6
datapoint2 a:2 d:8 p:10
.....
datapoint100: c:9 d:1 z:12

I want to print a list to a file as follows:

           a b c d ...... z
datapoint1 1 2 6 0 ...... 0
datapoint2 2 0 0 8 ...... 0
.........
.........
datapoint100 0 0 9 1 ...... 12

Here to mention a,b,c...z are just for example the real number of words are not known beforehand, so the total number of words is not 26, it can be 1000/ 10000 and a, b, .... will be replaced with real words like 'my', 'hi', 'tote' ... etc.

I have been thinking of trying to do it as follows:

build a dictionary of words lets call it global dictionary
then build a list of dictionaries where each dictionary represents a data point
then trying to map the list of dictionaries to the global dictionaries

But this method seems complicated in python. Is there any better way of doing it in python?

DSM · Accepted Answer

If you don't care much about the fiddly bits of column alignment, this isn't too bad:

datapoints = [{'a': 1, 'b': 2, 'c': 6},
              {'a': 2, 'd': 8, 'p': 10},
              {'c': 9, 'd': 1, 'z': 12}]

# get all the keys ever seen
keys = sorted(set.union(*(set(dp) for dp in datapoints)))

with open("outfile.txt", "wb") as fp:
    # write the header
    fp.write("{}
".format(' '.join([" "] + keys)))
    # loop over each point, getting the values in order (or 0 if they're absent)
    for i, datapoint in enumerate(datapoints):
        out = '{} {}
'.format(i, ' '.join(str(datapoint.get(k, 0)) for k in keys))
        fp.write(out)

produces

  a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12

As mentioned in the comments, the pandas solution is pretty nice too:

>>> import pandas as pd
>>> df = pd.DataFrame(datapoints).fillna(0).astype(int)
>>> df
   a  b  c  d   p   z
0  1  2  6  0   0   0
1  2  0  0  8  10   0
2  0  0  9  1   0  12
>>> df.to_csv("outfile_pd.csv", sep=" ")
>>> !cat outfile_pd.csv
 a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12

If you really need the columns nicely aligned, then there are ways to do that too, but I never need them so I don't know much about them.

Building a dictionary of words from multiple lists in python

Answers (2)

Related Questions