Reputation: 131
I have a list of dicts as follows:
l=[{"key1":10,"author":"test","years":["2011","2013"]},{"key2":10,"author":"test2","years":["2012"]},
{"key3":14,"author":"test2","years":["2014"]}]
Now,I want to get the list of all years and append 0's if a particular year is not present in years key Hence,my output should be like this:
l=[{"key1":10,"author":"test","years":["2011","0","2013","0"]},{"key2":10,"author":"test2","years":["0","2012","0","0"]},
{"key3":14,"author":"test2","years":["0","0","0","2014"]}]
Is there any possible way to do it? tried the following code as suggested:
pd.Dataframe(l)
df.pivot(columns="years")
it doesn't work.Is there a way to do this?
Upvotes: 0
Views: 381
Reputation: 71517
Build a list of all the years:
>>> years = sorted({year for d in l for year in d["years"]})
and then use comprehensions to build dicts with the list elements replaced appropriately:
>>> [{k:
... [y if y in v else "0" for y in years]
... if k == "years" else v
... for k, v in d.items()
... } for d in l]
[{'key1': 10, 'author': 'test', 'years': ['2011', '0', '2013', '0']}, {'key2': 10, 'author': 'test2', 'years': ['0', '2012', '0', '0']}, {'key3': 14, 'author': 'test2', 'years': ['0', '0', '0', '2014']}]
You can technically do this all in a single statement (it's a bit less efficient though since you're rebuilding the set of all years multiple times):
>>> [{k: [
... y if y in v else "0" for y in sorted({
... y for d in l for y in d["years"]
... })
... ] if k == "years" else v for k, v in d.items()} for d in l]
[{'key1': 10, 'author': 'test', 'years': ['2011', '0', '2013', '0']}, {'key2': 10, 'author': 'test2', 'years': ['0', '2012', '0', '0']}, {'key3': 14, 'author': 'test2', 'years': ['0', '0', '0', '2014']}]
Upvotes: 2
Reputation: 22503
Here's one pandas
way using crosstab
:
df = pd.DataFrame(l)
temp = df.explode("years").astype({"years": int})
s = pd.crosstab(temp.index, temp["years"])
df["new"] = [i.tolist() for i in s.to_numpy()*s.columns.to_numpy()]
print (df)
key1 author years key2 key3 new
0 10.0 test [2011, 2013] NaN NaN [2011, 0, 2013, 0]
1 NaN test2 [2012] 10.0 NaN [0, 2012, 0, 0]
2 NaN test2 [2014] NaN 14.0 [0, 0, 0, 2014]
Upvotes: 1
Reputation: 2977
You don't need a data frame to over complicated this problem, you're just filling out space if the year is not present.
l=[{"key1":10,"author":"test","years":["2011","2013"]},{"key2":10,"author":"test2","years":["2012"]},
{"key3":14,"author":"test2","years":["2014"]}]
for obj in l: # for each entry
filled = [] # set a new filled list
for year in ["2012", "2013", "2014", "2015", "2016"]: # check if each year in present in the years list
if year in obj["years"]:
filled.append(year)
else:
filled.append("0")
obj["years"] = filled # assign it back
print(l)
Upvotes: 2