Reputation: 1097
This is an extended question to this post.
I'm working on a large dataset and it seems to be very messy containing random strings here and there with no order. I would like to check on these values column wise without printing the numeric values of those columns. For example if df.Col
is a float
object but has strings such as #$, ^%#, etc., is there a way to view/print only those?
EDIT:
For example I have a column Client_Income
.
print(str(df["Client_Income"].name).upper())
print("No. of unique values:", df["Client_Income"].nunique())
print(df["Client_Income"].unique())
CLIENT_INCOME
No. of unique values: 1217
['6750' '20250' '18000' ... '13140' '9764.1' '12840.75']
While this shows that the column could be purely float
deliberately converted to object
, when I do .astype("float16")
for example, I get the error Cannot convert to float: '$'
. There are other columns that contains values such as 'XNA', '##', etc.
Upvotes: 1
Views: 42
Reputation: 863166
Use to_numeric
with errors='coerce'
for convert values to numeric and if failed is created NaN
s, so then filter by NaN
in converted Series
and non NaNs by original column:
df = pd.DataFrame({
'Client_Income':['6750', '20.250', '18000','XNA', '##45', '#4.5#', np.nan],
})
s = pd.to_numeric(df['Client_Income'], errors='coerce')
out1 = df.loc[s.isna() & df['Client_Income'].notna(), 'Client_Income']
print (out1)
3 XNA
4 ##45
5 #4.5#
Name: Client_Income, dtype: object
Upvotes: 1