Reputation: 33
I have a file named employee.csv
with columns empid
as integer and empname
string. I am reading the files into a dataframe d1
by defining the schema and reading into another dataframe d2
as it is. The employee.csv has the data as below:
01,A\n
02,B\n
3,C\n
D,d\n
I want to list out rows where the empid is not an integer. I converted the column empid to integer in d2 to find out the badrows by using subtract but now I see the row D,d coming as null,d as the output of the subtract command. How do I get the desired out.
I also tried to filter out rows that fails to cast into integer but that doesn't seem to work either.
d3 = d2.filter(d2[“empid”].cast(“int”).isNull())
Please let me know how do we achieve it.
Upvotes: 0
Views: 56
Reputation: 1787
To return a list of empid's that are not integers you can use filter()
and isdigit()
:
out = d2.filter(~d2['empid'].astype(str).str.isdigit()).tolist()
Upvotes: 0