CSV from list of dictionaries with differing length and keys

I have a list of dictionaries that I want to write to a csv file. The first dictionary is of a different length and has different keys than the following dictionaries.

dict_list = [{"A": 1, "B": 2}, {"C": 3, "D": 4, "E": 5}, {"C": 6, "D": 7, "E": 8}, ...]

How do I write this to a csv-file so that the file looks like this:

A B C D E
1 2 3 4 5
    6 7 8
    . . .

Upvotes: 1

Answers (3)

eapetcho

Reputation: 527

You can also use only the built-in functionalities that come with the python language. My example below is similar to that proposed by @Serge Ballesta. The code is as follows:

import csv

# sample data
data = [{'A': 1, 'B': 2}, {'A': 3, 'D': 4, 'E': 5}, {'C': 6, 'D': 7, 'E': 8}]
# Collect from elements in **data** (they are dict object) the field names and store
# them in a **set** to preserve their uniqueness
fields = set()
for item in data:
    names = set(item.keys())
    fields = fields | names   # we used the **or** i.e | operator for **set**

fields = list(fields)   # cast the fields into a list
# and sort the content so that during the display everything is in order :)
fields.sort()

# Now let write a function that return a cleaned data from the original, that is all
# data items have the same field names.

def clean_data(origdata, fieldnames):
    """Turn the original data into a new data with similar field in data items.

    Parameters
    ----------
    origdata: list of dict
         original data which will be cleaned or harmonized according to the field names
    fieldnames: list of strings
         fields names in the new data items

    Returns
    -------
    Returns a new data consisting of list of dict where all dict items have the same
    keys (i.e fieldnames)
    """
    newdata = []
    for dataitem in data:
        keys = dataitem.keys()
        for key in fieldnames:
             if key not in keys:
                  # In this instance we update the datitem with **key** and value= ' '
                  dataitem[key] = ' '
        newdata.append(dataitem)

    return newdata


def main():
    """Test the above function and display the result"""
    newdata = clean_data(data, fields)

    # write the data to a csv file
    with open("data.csv", "w", newline='') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=fields)
        writer.writeheader()
        for row in newdata:
            writer.writerow(row)

    # Now let load our newly written csv file and print the content
    # -- some fancy display formatting here: not needed but I like it. :)
    nfields = len(fields)
    fmt = " %s " * nfields
    headInfo = fmt % tuple(fields)
    line = '-'* (len(headInfo)+1)
    print(line)
    print("|" + headInfo)
    print(line)
    with open("data.csv", "r", newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        for item im reader:
            row = [item[field] for field in fields]
            printf("|" + fmt % tuple(row))

    print(line)



main()

The script above will produce the following output:

---------------------
| A | B | C | D | E |
---------------------
| 1 | 2 |   |   |   |
|   |   | 3 | 4 | 5 |
|   |   | 6 | 7 | 8 |
---------------------

Upvotes: 0

Serge Ballesta

Reputation: 149075

The problem is that you will need the full column set to write the header at the beginning of the file. But apart from that, csv.DictWriter is what you need:

# optional: compute the fieldnames:
fieldnames = set()
for d in dict_list:
    fieldnames.update(d.keys())
fieldnames = sorted(fieldnames)    # sort the fieldnames...

# produce the csv file
with open("file.csv", "w", newline='') as fd:
    wr = csv.DictWriter(fd, fieldnames)
    wr.writeheader()
    wr.writerows(dict_list)

And the produced csv will look like this:

A,B,C,D,E
1,2,,,
,,3,4,5
,,6,7,8

If you really want to combine rows with disjoint set of keys, you could do:

# produce the csv file
with open("file.csv", "w", newline='') as fd:
    wr = csv.DictWriter(fd, sorted(fieldnames))
    old = { k: k for k in wr.fieldnames }     # use old for the header line
    for row in dict_list:
        if len(set(old.keys()).intersection(row.keys())) != 0:
            wr.writerow(old)                  # common fields: write old and start a new row
            old = row
        old.update(row)                       # disjoint fields: just combine
    wr.writerow(old)                          # do not forget last row

You would get:

A,B,C,D,E
1,2,3,4,5
,,6,7,8

Upvotes: 1

Lukas Thaler

Reputation: 2720

Pandas is able to generate a dataframe from a list of dictionaries if you call pd.DataFrame() on the list. In the resulting dataframe, every dictionary will be one row and every key will correspond to a column. The value corresponding to the 3rd key (I'll call it key3) in the 7th dict therefore will be located in the 7th row of the key3-column.

What this means for your problem: you'll first have to modify your dict_list to include the merged dictionary like so:

dict_list.insert(2, dict(**dict_list[0], **dict_list[1]))
print(dict_list)

[{'A': 1, 'B': 2},
 {'C': 3, 'D': 4, 'E': 5},
 {'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5},
 {'C': 6, 'D': 7, 'E': 8}]

This inserts the combination of the first two dictionaries at index 2 into your list. Why index 2? That allows you to conveniently slice your list when converting it to a dataframe, giving you the desired output

df = pd.DataFrame(dict_list[2:])
print(df)

     A    B  C  D  E
0  1.0  2.0  3  4  5
1  NaN  NaN  6  7  8

For comparison, calling pd.DataFrame on the unmodified list directly gives you

df_unmodified = pd.DataFrame(dict_list)
print(df_unmodified)

     A    B    C    D    E
0  1.0  2.0  NaN  NaN  NaN
1  NaN  NaN  3.0  4.0  5.0
2  NaN  NaN  6.0  7.0  8.0

Afterwards, you can use df.to_csv() to save the dataframe to a csv-file

Upvotes: 0

CSV from list of dictionaries with differing length and keys

Answers (3)

Related Questions