Reputation: 157
I have a data frame as below :
member_id | loan_amnt | Age | Marital_status
AK219 | 49539.09 | 34 | Married
AK314 | 1022454.00 | 37 | NA
BN204 | 75422.00 | 34 | Single
I want to create an output file in the below format
Columns | Null Values | Duplicate |
member_id | N | N |
loan_amnt | N | N |
Age | N | Y |
Marital Status| Y | N |
I know about one python package called PandasProfiling
but I want build this in the above manner so that I can enhance my code with respect to the data sets.
Upvotes: 1
Views: 706
Reputation: 11
Actually the Pandas_Profiling gives you multiple options where you can figure out if there are repetitive values.
Upvotes: 0
Reputation: 2032
Here is python one-liner:
pd.concat([df.isnull().any() , df.apply(lambda x: x.count() != x.nunique())], 1).replace({True: "Y", False: "N"})
Upvotes: 0
Reputation: 75150
Use something like:
m=df.apply(lambda x: x.duplicated())
n=df.isna()
df_new=(pd.concat([pd.Series(n.any(),name='Null_Values'),pd.Series(m.any(),name='Duplicates')],axis=1)
.replace({True:'Y',False:'N'}))
Upvotes: 2