Reputation: 995
I have the following issue: I import data from a csv. The imported csv looks like this
df <- data.frame(x=c(1,2,3,4,5), y=c("K","M",NA,NA,"K"))
Where K denotes 1 000 and M 1 000 000. I would like to create a new column with dplyr so that I use a list to subset K and M and multiply with values in x column
sul <- c("K"=1000, "M"=1000000, "NA"=1)
So using dplyr:
df %>% mutate(result=x * sul[y])
My problem is though, that that results from importing data from a csv are not being recognized in sul[y]
and I get either NA or NULL. Have you an idea how to solve this problem in an elegant way? Is there a better way then running:
df$y[is.na(df&y)]<-1
Thanks a lot!
p.s. subsetting by a list is chosen instead of for-loop to increase the speed of processing the data.
Upvotes: 2
Views: 31
Reputation: 887048
It may be better to replace NA
with 'Other' and then do
sul <- c(K=1000, M=1000000, Other=1)
df %>%
mutate(y1 = replace(as.character(y), is.na(y), "Other"),
result = x*sul[y1]) %>%
select(-y1)
# x y result
#1 1 K 1000
#2 2 M 2000000
#3 3 <NA> 3
#4 4 <NA> 4
#5 5 K 5000
The 'NA' in sul
is a character string and not a real NA
. So, if we are using the 'sul' from OP's post, replace
the 'NA' in 'y' to "NA"
df %>%
mutate(result = x*sul[replace(as.character(y), is.na(y), "NA")])
Upvotes: 1