mikebmassey
mikebmassey

Reputation: 8584

Wrong R data type or bad data?

I'm having trouble doing simple functions on a data frame and am unsure whether it's the data type of the column, or bad data in the data frame.

I exported a SQL query into a CSV file, then loaded it into a data frame, then attached it.

df <-read.csv("~/Desktop/orders.csv")
Attach(df)

When I am done, and run str(df), here is what I get:

$ AccountID: Factor w/ 18093 levels "(819947 row(s) affected)",..: 10 97 167 207 207 299 299 309 352 573 ...
$ OrderID   : int  1874197767 1874197860 1874196789 1874206918 1874209100 1874207018 1874209111 1874233050 1874196791 1875081598 ...
$ OrderDate : Factor w/ 280 levels "","2010-09-24",..: 2 2 2 2 2 2 2 2 2 2 ...
$ NumofProducts  : int  16 6 4 6 10 4 2 4 6 40 ...
$ OrderTotal    : num  20.3 13.8 12.5 13.8 16.4 ...
$ SpecialOrder : int  1 1 1 1 1 1 1 1 1 1 ...   

Trying to run the following functions, here is what I get:

> length(OrderID)
[1] 0

> min(OrderTotal)
[1] NA

> min(OrderTotal, na.rm=TRUE)
[1] 5.00

> mean(NumofProducts)
[1] NA

> mean(NumofProducts, na.rm=TRUE)
[1] 3.462902

I have two questions related to this data frame:

Upvotes: 0

Views: 263

Answers (1)

Spacedman
Spacedman

Reputation: 94172

The difference between num and int is pretty irrelevant at this stage.

See help(is.na) for starters on NA handling. Do things like:

sum(is.na(foo))

to see how many foo's are NA values. Then things like:

df[is.na(df$foo),]

to see the rows of df where foo is NA.

Upvotes: 2

Related Questions