Reputation: 2790
I want to count the number of occurrences per column of two different values the first one is the number of null and the second one is the number of \N
in my dataframe. Example I've got:
A B C D E D
1 \N 1 \N 12 1
2 4 \N 3 0 \N
3 4 M \N 1
I'm expected the following result:
A 2
B 1
C 2
D 1
E 1
F 2
I already succeed to count the number of missing value with the following code:
df = pd.read_csv("mypath/myFile", sep=',')
null_value = df.isnull().sum()
But the following code doesn't work:
break_line = df[df == '\N'].count()
return break_line + null_value
I get the following error
TypeError: Could not compare ['\N'] with block values
Upvotes: 1
Views: 403
Reputation: 17882
A solution which uses only vectorized calculations:
df.isna().sum() + (df == '\\N').sum()
Output:
A 2
B 1
C 2
D 1
E 1
F 2
Upvotes: 0
Reputation: 14963
one liner:
ns = df.applymap(lambda x: x == '\N').sum(axis = 0)
null_value + ns
A 2
B 1
C 2
D 1
E 1
F 2
Upvotes: 2
Reputation: 109686
I assume you only want to count values where the string ends with '\N'. If not, you can use str.contains
instead.
I use a dictionary comprehension to loop through the columns of the dataframe and a vectorized str
function to count the number of rows with \N
at the end.
df = pd.DataFrame({'A': ['\N', 4, None],
'B': [1, None, 4],
'C': ['\N', '\N', 'M'],
'D': [12, 3, None],
'E': [1, 0, '\N'],
'F': [None, '\N', 1]})
>>> df
A B C D E F
0 \N 1 \N 12 1 None
1 4 NaN \N 3 0 \N
2 None 4 M NaN \N 1
>>> pd.Series({col: df[col].str.endswith('\N').sum()
if df[col].dtype == 'object' else 0
for col in df}) + df.isnull().sum()
A 2
B 1
C 2
D 1
E 1
F 2
dtype: int64
Upvotes: 1