Reputation: 175
I have a list of dictionaries that I want to write to a csv file. The first dictionary is of a different length and has different keys than the following dictionaries.
dict_list = [{"A": 1, "B": 2}, {"C": 3, "D": 4, "E": 5}, {"C": 6, "D": 7, "E": 8}, ...]
How do I write this to a csv-file so that the file looks like this:
A B C D E
1 2 3 4 5
6 7 8
. . .
Upvotes: 1
Views: 1652
Reputation: 527
You can also use only the built-in functionalities that come with the python language. My example below is similar to that proposed by @Serge Ballesta. The code is as follows:
import csv
# sample data
data = [{'A': 1, 'B': 2}, {'A': 3, 'D': 4, 'E': 5}, {'C': 6, 'D': 7, 'E': 8}]
# Collect from elements in **data** (they are dict object) the field names and store
# them in a **set** to preserve their uniqueness
fields = set()
for item in data:
names = set(item.keys())
fields = fields | names # we used the **or** i.e | operator for **set**
fields = list(fields) # cast the fields into a list
# and sort the content so that during the display everything is in order :)
fields.sort()
# Now let write a function that return a cleaned data from the original, that is all
# data items have the same field names.
def clean_data(origdata, fieldnames):
"""Turn the original data into a new data with similar field in data items.
Parameters
----------
origdata: list of dict
original data which will be cleaned or harmonized according to the field names
fieldnames: list of strings
fields names in the new data items
Returns
-------
Returns a new data consisting of list of dict where all dict items have the same
keys (i.e fieldnames)
"""
newdata = []
for dataitem in data:
keys = dataitem.keys()
for key in fieldnames:
if key not in keys:
# In this instance we update the datitem with **key** and value= ' '
dataitem[key] = ' '
newdata.append(dataitem)
return newdata
def main():
"""Test the above function and display the result"""
newdata = clean_data(data, fields)
# write the data to a csv file
with open("data.csv", "w", newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fields)
writer.writeheader()
for row in newdata:
writer.writerow(row)
# Now let load our newly written csv file and print the content
# -- some fancy display formatting here: not needed but I like it. :)
nfields = len(fields)
fmt = " %s " * nfields
headInfo = fmt % tuple(fields)
line = '-'* (len(headInfo)+1)
print(line)
print("|" + headInfo)
print(line)
with open("data.csv", "r", newline='') as csvfile:
reader = csv.DictReader(csvfile)
for item im reader:
row = [item[field] for field in fields]
printf("|" + fmt % tuple(row))
print(line)
main()
The script above will produce the following output:
---------------------
| A | B | C | D | E |
---------------------
| 1 | 2 | | | |
| | | 3 | 4 | 5 |
| | | 6 | 7 | 8 |
---------------------
Upvotes: 0
Reputation: 149075
The problem is that you will need the full column set to write the header at the beginning of the file. But apart from that, csv.DictWriter
is what you need:
# optional: compute the fieldnames:
fieldnames = set()
for d in dict_list:
fieldnames.update(d.keys())
fieldnames = sorted(fieldnames) # sort the fieldnames...
# produce the csv file
with open("file.csv", "w", newline='') as fd:
wr = csv.DictWriter(fd, fieldnames)
wr.writeheader()
wr.writerows(dict_list)
And the produced csv will look like this:
A,B,C,D,E
1,2,,,
,,3,4,5
,,6,7,8
If you really want to combine rows with disjoint set of keys, you could do:
# produce the csv file
with open("file.csv", "w", newline='') as fd:
wr = csv.DictWriter(fd, sorted(fieldnames))
old = { k: k for k in wr.fieldnames } # use old for the header line
for row in dict_list:
if len(set(old.keys()).intersection(row.keys())) != 0:
wr.writerow(old) # common fields: write old and start a new row
old = row
old.update(row) # disjoint fields: just combine
wr.writerow(old) # do not forget last row
You would get:
A,B,C,D,E
1,2,3,4,5
,,6,7,8
Upvotes: 1
Reputation: 2720
Pandas is able to generate a dataframe from a list of dictionaries if you call pd.DataFrame()
on the list. In the resulting dataframe, every dictionary will be one row and every key will correspond to a column. The value corresponding to the 3rd key (I'll call it key3
) in the 7th dict therefore will be located in the 7th row of the key3
-column.
What this means for your problem: you'll first have to modify your dict_list
to include the merged dictionary like so:
dict_list.insert(2, dict(**dict_list[0], **dict_list[1]))
print(dict_list)
[{'A': 1, 'B': 2},
{'C': 3, 'D': 4, 'E': 5},
{'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5},
{'C': 6, 'D': 7, 'E': 8}]
This inserts the combination of the first two dictionaries at index 2 into your list. Why index 2? That allows you to conveniently slice your list when converting it to a dataframe, giving you the desired output
df = pd.DataFrame(dict_list[2:])
print(df)
A B C D E
0 1.0 2.0 3 4 5
1 NaN NaN 6 7 8
For comparison, calling pd.DataFrame
on the unmodified list directly gives you
df_unmodified = pd.DataFrame(dict_list)
print(df_unmodified)
A B C D E
0 1.0 2.0 NaN NaN NaN
1 NaN NaN 3.0 4.0 5.0
2 NaN NaN 6.0 7.0 8.0
Afterwards, you can use df.to_csv()
to save the dataframe to a csv-file
Upvotes: 0