Reputation: 35
Update:
sum(data[,"employee_count"], na.rm = T)
I have original data as:
employee_count
1-49
0
150-249
1-49
1000+
I wrote code as following:
data$employee_count<- as.character.factor (data$employee_count)
data[data$employee_count=="1-49","employee_count"]<-1
data[data$employee_count=="50-149","employee_count"]<-2
data[data$employee_count=="150-249","employee_count"]<-3
data[data$employee_count=="250-499","employee_count"]<-4
data[data$employee_count=="500-749","employee_count"]<-5
data[data$employee_count=="750-999","employee_count"]<-6
data[data$employee_count=="1000+","employee_count"]<-7
Then the data is changed as following:
employee_count
"1"
"0"
"3"
"1"
"7"
Then I try to change it to numeric:
data$employee_count<-as.numeric(as.character(data$employee_count))
Data is changed to 1 0 3 1 7
after the code, but when I tried to do sum(data$employee_count)
, and the output is NA
. I suppose there is something wrong.
The desired result is to actually changed this column to numbers, which can be involved in any kind of calculation.
For example, if I wrote data[1,"employee_count"]+data[2,"employee_count"]
,
the desired result will be 1+0 = 1
.
If I wrote sum(data$employee_count)
,
the result should be 1+0+3+1+7=12
.
If I wrote data[3,"employee_count"]*data[4,"employee_count"]
the result should be 3*1=3
.
Upvotes: 1
Views: 237
Reputation: 887118
sum(as.numeric(factor(data[,1], levels=unique(data[,1]))))
#[1] 6
If you check the order
as.numeric(factor(data[,1], levels=unique(data[,1])))
#[1] 1 2 3
which is not the same as
as.numeric(factor(data[,1]))
#[1] 1 3 2
data <- structure(list(employee_count = c("1-49", "50-149", "150-249"
)), .Names = "employee_count", class = "data.frame", row.names = c(NA,
-3L))
data <- structure(list(employee_count = c("1-49", "0", "150-249", "250-499",
"1-49", "500-749", "500-749", "750-999", "50-149", "1000+", "150-249"
)), .Names = "employee_count", row.names = c(NA, -11L), class = "data.frame")
data1 <- data
data[,1] <- as.numeric(factor(data[,1],
levels=c('0', '1-49', '50-149', '150-249', '250-499', '500-749', '750-999', '1000+')))-1
data[,1]
#[1] 1 0 3 4 1 5 5 6 2 7 3
data1[,1]
#[1] "1-49" "0" "150-249" "250-499" "1-49" "500-749" "500-749"
#[8] "750-999" "50-149" "1000+" "150-249"
sum(data[,1])
#[1] 37
data[3,"employee_count"]*data[4,"employee_count"]
#[1] 12 #different value because I used a different data
Upvotes: 2