Reputation: 37
I'm new with Pandas, looking for a method to add missing value in the dataframe. any idea how to produce from the input to output? Thanks
Input
Module Status Domain
0 1 N www.cat.com
1 2 N www.cat.com
2 5 Y www.cat.com
3 7 Y www.cat.com
4 8 Y www.cat.com
5 9 N www.cat.com
Expected
Module Status Domain
0 1 N www.cat.com
1 2 N www.cat.com
2 5 Y www.cat.com
3 7 Y www.cat.com
4 8 Y www.cat.com
5 9 N www.cat.com
6 3 N www.cat.com
7 4 N www.cat.com
8 6 N www.cat.com
Upvotes: 0
Views: 143
Reputation: 9197
You can try this:
status = set(range(1,10)) - set(df['Module'])
df.append(pd.DataFrame([df[df['Status']=='N'].iloc[0]]*len(status)).assign(Module=status)).reset_index(drop=True)
Result:
# Module Status Domain
# 0 1 N www.cat.com
# 1 2 N www.cat.com
# 2 5 Y www.cat.com
# 3 7 Y www.cat.com
# 4 8 Y www.cat.com
# 5 9 N www.cat.com
# 6 3 N www.cat.com
# 7 4 N www.cat.com
# 8 6 N www.cat.com
Upvotes: 0
Reputation: 35626
A reindexing option:
df = df.set_index('Module') \
.reindex(np.arange(df["Module"].min(), df["Module"].max() + 1)) \
.fillna(pd.Series(['N', 'www.cat.com'], index=['Status', 'Domain'])) \
.reset_index()
print(df)
Module Status Domain
0 1 N www.cat.com
1 2 N www.cat.com
2 3 N www.cat.com
3 4 N www.cat.com
4 5 Y www.cat.com
5 6 N www.cat.com
6 7 Y www.cat.com
7 8 Y www.cat.com
8 9 N www.cat.com
Upvotes: 2
Reputation: 195428
Try:
mn, mx = df["Module"].min(), df["Module"].max()
missing = list(set(df["Module"]).symmetric_difference(range(mn, mx + 1)))
df = (
pd.concat(
[df, pd.DataFrame({"Module": missing, "Status": "N", "Domain": np.nan})]
)
.ffill()
.reset_index(drop=True)
)
print(df)
Prints:
Module Status Domain
0 1 N www.cat.com
1 2 N www.cat.com
2 5 Y www.cat.com
3 7 Y www.cat.com
4 8 Y www.cat.com
5 9 N www.cat.com
6 3 N www.cat.com
7 4 N www.cat.com
8 6 N www.cat.com
Upvotes: 1
Reputation: 348
You can create a list of Module column and using loop match with expected list to be present and create another dataframe when the no is not match. After then merge the two dataframe
Upvotes: 0