tnt
tnt

Reputation: 1459

Change some numerical values in data frame

I have a dataframe with incorrect values for two variables (Lat and Lon). The incorrect values in the dataframe are listed as 999.00, and the correct values should be 42.68 and -72.47, respectively.

I would like an easy way to replace these values using dplyr, but my attempts (see below) have been unsuccessful (errors provided below).

df$Lat2 <- recode(df$Lat, "999.00"="42.68", .default=x)

Error in lapply(x, f) : object 'x' not found

df <- df %>%
mutate(Lat2 = if_else(Lat == 999.00, 42.68, NULL, NULL))

Error in mutate_impl(.data, dots) : Evaluation error: unused argument (recvLat = 999).

df <- df %>%
mutate(Lat2 = ifelse(Lat == 999.00, 42.68, NULL))

Error in mutate_impl(.data, dots) : Evaluation error: replacement has length zero. In addition: Warning message: In rep(no, length.out = length(ans)) : 'x' is NULL so the result will be NULL

df <- df %>%
mutate(Lat2 = case_when(Lat == 999.00 ~ 42.68, TRUE ~ NULL))

Error in mutate_impl(.data, dots) : Evaluation error: subscript out of bounds.

For the latter three attempts, I get the same error if the number are in quotes (i.e. "999.00" and "42.68")

Upvotes: 1

Views: 241

Answers (3)

akrun
akrun

Reputation: 887971

We can place NULL in a list

df %>%
    mutate(Lat2 = ifelse(recvLat == 999.00, 42.68, list(NULL)))
#  recvLat  Lat2
#1   999.0 42.68
#2     1.5  NULL
#3     2.5  NULL

Instead of NULL, it can be NA

df %>%
    mutate(Lat2 = ifelse(recvLat == 999.00, 42.68, NA_real_))
#  recvLat  Lat2
#1   999.0 42.68
#2     1.5    NA
#3     2.5    NA

If we want to do the opposite, just use !=

df %>%
    mutate(Lat2 = ifelse(recvLat != 999.00, 42.68, NA_real_))

Based on the OP's comment,

df %>%
    mutate(Lat2 = ifelse(recvLat == 999.00, 42.68, recvLat))

In base R, we can do this by creating an index

i1 <- df$recvLat == 999
df$recLat[i1] <- 42.68

NOTE: Both the solutions work.

data

df <- data.frame(recvLat = c(999, 1.5,  2.5))

Upvotes: 0

geneorama
geneorama

Reputation: 3720

Actually, this is a nice case to show the elegance of data.table as well.

library(data.table)

## Create example
data <- data.table(lat = c(999, 0, 0),
                   lon = c(0, 999, 0))

## Reassign values
data[lat==999, lat := 42.68]
data[lon==999, lon := -72.47]

## Print results
data
#      lat    lon
# 1: 42.68   0.00
# 2:  0.00 -72.47
# 3:  0.00   0.00

The downside is that you have to remember that := is needed for assignment.

The upsides are

  • You can refer to variables by name without quotes
  • Memory efficient, especially important for large data sets
  • Doesn't have dependencies
  • The data.table syntax is much simpler and consistent (i, j, group by)
  • You don't need to memorize a bunch of functions with weird names that may and/or may not be spelled Aussie style (like colour or summarise)
  • You can use base R more, which makes your code more portable and widely understood
  • The data.table class inherits the data.frame class, so it's more compatible within R

Upvotes: 1

boski
boski

Reputation: 2467

why not use this.

data=as.data.frame(matrix(0,3,3))
names(data)=c("a","b","c")
data$a[1]=999
data$c[2]=999
data$a[which(data$a==999)]=42.68
data$c[which(data$c==999)]=-72.47
data
      a b      c
1 42.68 0   0.00
2  0.00 0 -72.47
3  0.00 0   0.00

Upvotes: 2

Related Questions