user677101
user677101

Reputation: 31

Building a dictionary of words from multiple lists in python

I have a list of dictionaries of 100 points as follows:

datapoint1 a:1 b:2 c:6
datapoint2 a:2 d:8 p:10
.....
datapoint100: c:9 d:1 z:12

I want to print a list to a file as follows:

           a b c d ...... z
datapoint1 1 2 6 0 ...... 0
datapoint2 2 0 0 8 ...... 0
.........
.........
datapoint100 0 0 9 1 ...... 12

Here to mention a,b,c...z are just for example the real number of words are not known beforehand, so the total number of words is not 26, it can be 1000/ 10000 and a, b, .... will be replaced with real words like 'my', 'hi', 'tote' ... etc.

I have been thinking of trying to do it as follows:

  1. build a dictionary of words lets call it global dictionary
  2. then build a list of dictionaries where each dictionary represents a data point
  3. then trying to map the list of dictionaries to the global dictionaries

But this method seems complicated in python. Is there any better way of doing it in python?

Upvotes: 0

Views: 240

Answers (2)

martega
martega

Reputation: 2143

Program:

data_points = [
    {'a': 1, 'b': 2, 'c': 6},
    {'a': 2, 'd': 8, 'p': 10},
    {'c': 9, 'd': 1, 'z': 12},
    {'e': 3, 'f': 6, 'g': 3}
]

merged_data_points = {
}

for data_point in data_points:
    for k, v in data_point.items():
        if k not in merged_data_points:
            merged_data_points[k] = []
        merged_data_points[k].append(v)

# print the merged datapoints
print '{'
for k in merged_data_points:
    print '  {0}: {1},'.format(k, merged_data_points[k])
print '}'

Output:

{
  a: [1, 2],
  c: [6, 9],
  b: [2],
  e: [3],
  d: [8, 1],
  g: [3],
  f: [6],
  p: [10],
  z: [12],
}

Upvotes: 0

DSM
DSM

Reputation: 353019

If you don't care much about the fiddly bits of column alignment, this isn't too bad:

datapoints = [{'a': 1, 'b': 2, 'c': 6},
              {'a': 2, 'd': 8, 'p': 10},
              {'c': 9, 'd': 1, 'z': 12}]

# get all the keys ever seen
keys = sorted(set.union(*(set(dp) for dp in datapoints)))

with open("outfile.txt", "wb") as fp:
    # write the header
    fp.write("{}\n".format(' '.join([" "] + keys)))
    # loop over each point, getting the values in order (or 0 if they're absent)
    for i, datapoint in enumerate(datapoints):
        out = '{} {}\n'.format(i, ' '.join(str(datapoint.get(k, 0)) for k in keys))
        fp.write(out)

produces

  a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12

As mentioned in the comments, the pandas solution is pretty nice too:

>>> import pandas as pd
>>> df = pd.DataFrame(datapoints).fillna(0).astype(int)
>>> df
   a  b  c  d   p   z
0  1  2  6  0   0   0
1  2  0  0  8  10   0
2  0  0  9  1   0  12
>>> df.to_csv("outfile_pd.csv", sep=" ")
>>> !cat outfile_pd.csv
 a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12

If you really need the columns nicely aligned, then there are ways to do that too, but I never need them so I don't know much about them.

Upvotes: 1

Related Questions