KKK
KKK

Reputation: 37

Pandas add missing value

I'm new with Pandas, looking for a method to add missing value in the dataframe. any idea how to produce from the input to output? Thanks

Input 

      Module  Status  Domain
0        1    N      www.cat.com
1        2    N      www.cat.com
2        5    Y      www.cat.com
3        7    Y      www.cat.com
4        8    Y      www.cat.com
5        9    N      www.cat.com

Expected 

      Module  Status  Domain
0        1    N      www.cat.com
1        2    N      www.cat.com
2        5    Y      www.cat.com
3        7    Y      www.cat.com
4        8    Y      www.cat.com
5        9    N      www.cat.com
6        3    N      www.cat.com
7        4    N      www.cat.com
8        6    N      www.cat.com

Upvotes: 0

Views: 143

Answers (4)

Andreas
Andreas

Reputation: 9197

You can try this:

status = set(range(1,10)) - set(df['Module'])
df.append(pd.DataFrame([df[df['Status']=='N'].iloc[0]]*len(status)).assign(Module=status)).reset_index(drop=True)

Result:

#    Module Status       Domain
# 0       1      N  www.cat.com
# 1       2      N  www.cat.com
# 2       5      Y  www.cat.com
# 3       7      Y  www.cat.com
# 4       8      Y  www.cat.com
# 5       9      N  www.cat.com
# 6       3      N  www.cat.com
# 7       4      N  www.cat.com
# 8       6      N  www.cat.com

Upvotes: 0

Henry Ecker
Henry Ecker

Reputation: 35626

A reindexing option:

df = df.set_index('Module') \
    .reindex(np.arange(df["Module"].min(), df["Module"].max() + 1)) \
    .fillna(pd.Series(['N', 'www.cat.com'], index=['Status', 'Domain'])) \
    .reset_index()
print(df)
   Module Status       Domain
0       1      N  www.cat.com
1       2      N  www.cat.com
2       3      N  www.cat.com
3       4      N  www.cat.com
4       5      Y  www.cat.com
5       6      N  www.cat.com
6       7      Y  www.cat.com
7       8      Y  www.cat.com
8       9      N  www.cat.com

Upvotes: 2

Andrej Kesely
Andrej Kesely

Reputation: 195428

Try:

mn, mx = df["Module"].min(), df["Module"].max()
missing = list(set(df["Module"]).symmetric_difference(range(mn, mx + 1)))

df = (
    pd.concat(
        [df, pd.DataFrame({"Module": missing, "Status": "N", "Domain": np.nan})]
    )
    .ffill()
    .reset_index(drop=True)
)
print(df)

Prints:

   Module Status       Domain
0       1      N  www.cat.com
1       2      N  www.cat.com
2       5      Y  www.cat.com
3       7      Y  www.cat.com
4       8      Y  www.cat.com
5       9      N  www.cat.com
6       3      N  www.cat.com
7       4      N  www.cat.com
8       6      N  www.cat.com

Upvotes: 1

Daniya Niazi
Daniya Niazi

Reputation: 348

You can create a list of Module column and using loop match with expected list to be present and create another dataframe when the no is not match. After then merge the two dataframe

Upvotes: 0

Related Questions