Reputation: 159
I have a dataframe. I want to normalize columns 2 and 3 by dividing them by the maximum value of column 2 and 3.
> testdf<- data.frame("a"=c("b",2), "b"=2:3, "c"=3:4, "d"=4:5, stringsAsFactors = F)
> testdf
a b c d
1 b 2 3 4
2 2 3 4 5
> testdf[2:3]<-testdf[2:3] / do.call(pmax, testdf[2:3])
> testdf
a b c d
1 b 0.6666667 1 4
2 2 0.7500000 1 5
Notice how the df contains a mix of numerical and string values? Now I want to add a row with more data. If the first element of the added row is a string, the code gives an error.
> testdf<- data.frame("a"=c("b",2), "b"=2:3, "c"=3:4, "d"=4:5, stringsAsFactors = F)
> testdf
a b c d
1 b 2 3 4
2 2 3 4 5
> testdf<- testdf %>% rbind(c("a",6,7,8))
> testdf
a b c d
1 b 2 3 4
2 2 3 4 5
3 a 6 7 8
> testdf[2:3]<-testdf[2:3] / do.call(pmax, testdf[2:3])
Error in FUN(left, right) : non-numeric argument to binary operator
If instead I use only numerical values, it works.
> testdf<- data.frame("a"=c("b",2), "b"=2:3, "c"=3:4, "d"=4:5, stringsAsFactors = F)
> testdf
a b c d
1 b 2 3 4
2 2 3 4 5
> testdf<- testdf %>% rbind(c(5,6,7,8))
> testdf
a b c d
1 b 2 3 4
2 2 3 4 5
3 5 6 7 8
> testdf[2:3]<-testdf[2:3] / do.call(pmax, testdf[2:3])
> testdf
a b c d
1 b 0.6666667 1 4
2 2 0.7500000 1 5
3 5 0.8571429 1 8
Any help to why this happens is greatly appreciated. I need to be able to add rows that contain text and numbers while keeping the code working. My guess is that I'm messing up types but I couldn't figure out, how.
Upvotes: 1
Views: 209
Reputation: 32548
When you do rbind(c("a",6,7,8))
you are effectively doing rbind(c("a","6","7","8"))
thereby making everything in testdf
character. This is because a vector (c(...)
or individual columns of testdf
) can hold data of only one type and R
will try to do so while accommodating all data. In this case, character
would store all data but numeric
would get rid of the letters for example.
Just use testdf %>% rbind(list("a",6,7,8))
instead of testdf %>% rbind(c("a",6,7,8))
.
Compare the output of list("a",6,7,8)
vs that of c("a",6,7,8)
.
Upvotes: 1
Reputation: 887008
We can use add_row
library(tibble)
testdf <- add_row(testdf, !!!set_names(list('a', 6, 7, 8), names(testdf)))
testdf
# a b c d
#1 b 2 3 4
#2 2 3 4 5
#3 a 6 7 8
Now, do the pmax
on the numeric columns
testdf[2:3] / do.call(pmax, testdf[2:3])
# b c
#1 0.6666667 1
#2 0.7500000 1
#3 0.8571429 1
Upvotes: 1