Sam
Sam

Reputation: 131

adding to zeros to list in a dictionary in a list

I have a list of dicts as follows:

l=[{"key1":10,"author":"test","years":["2011","2013"]},{"key2":10,"author":"test2","years":["2012"]},
{"key3":14,"author":"test2","years":["2014"]}]

Now,I want to get the list of all years and append 0's if a particular year is not present in years key Hence,my output should be like this:

l=[{"key1":10,"author":"test","years":["2011","0","2013","0"]},{"key2":10,"author":"test2","years":["0","2012","0","0"]},
{"key3":14,"author":"test2","years":["0","0","0","2014"]}]

Is there any possible way to do it? tried the following code as suggested:

pd.Dataframe(l)
df.pivot(columns="years")

it doesn't work.Is there a way to do this?

Upvotes: 0

Views: 381

Answers (3)

Samwise
Samwise

Reputation: 71517

Build a list of all the years:

>>> years = sorted({year for d in l for year in d["years"]})

and then use comprehensions to build dicts with the list elements replaced appropriately:

>>> [{k: 
...    [y if y in v else "0" for y in years] 
...    if k == "years" else v 
...    for k, v in d.items()
... } for d in l]
[{'key1': 10, 'author': 'test', 'years': ['2011', '0', '2013', '0']}, {'key2': 10, 'author': 'test2', 'years': ['0', '2012', '0', '0']}, {'key3': 14, 'author': 'test2', 'years': ['0', '0', '0', '2014']}]

You can technically do this all in a single statement (it's a bit less efficient though since you're rebuilding the set of all years multiple times):

>>> [{k: [
...     y if y in v else "0" for y in sorted({
...         y for d in l for y in d["years"]
...     })
... ] if k == "years" else v for k, v in d.items()} for d in l]
[{'key1': 10, 'author': 'test', 'years': ['2011', '0', '2013', '0']}, {'key2': 10, 'author': 'test2', 'years': ['0', '2012', '0', '0']}, {'key3': 14, 'author': 'test2', 'years': ['0', '0', '0', '2014']}]

Upvotes: 2

Henry Yik
Henry Yik

Reputation: 22503

Here's one pandas way using crosstab:

df = pd.DataFrame(l)
temp = df.explode("years").astype({"years": int})
s = pd.crosstab(temp.index, temp["years"])
df["new"] = [i.tolist() for i in s.to_numpy()*s.columns.to_numpy()]

print (df)

   key1 author         years  key2  key3                 new
0  10.0   test  [2011, 2013]   NaN   NaN  [2011, 0, 2013, 0]
1   NaN  test2        [2012]  10.0   NaN     [0, 2012, 0, 0]
2   NaN  test2        [2014]   NaN  14.0     [0, 0, 0, 2014]

Upvotes: 1

kennysliding
kennysliding

Reputation: 2977

You don't need a data frame to over complicated this problem, you're just filling out space if the year is not present.

l=[{"key1":10,"author":"test","years":["2011","2013"]},{"key2":10,"author":"test2","years":["2012"]},
{"key3":14,"author":"test2","years":["2014"]}]

for obj in l: # for each entry
    filled = [] # set a new filled list
    for year in ["2012", "2013", "2014", "2015", "2016"]: # check if each year in present in the years list
        if year in obj["years"]:
            filled.append(year)
        else:
            filled.append("0")
    obj["years"] = filled # assign it back

print(l)

Upvotes: 2

Related Questions