Kwai Sheng
Kwai Sheng

Reputation: 11

how to swap data between two columns based on the data type? [R]

ds <- read.csv(file="~/Desktop/abalone.csv)

Sex  Length
 M     12.1
 F     11.2
12.1    F
 I     11.2
12.1   11.2
11.2    I

The dataset contains two columns which are sex and Length. The sex column contains three values which are "F" (Female), "M" (Male) and "I" (infant). The dataset has been modified and has several issues which I need to solve. I need to solve the following problems

  1. misfielded values (need help for this question)
  2. Invalid values(solved)

My idea to solve the misfielded values is to swap the data between two columns based on the value in Length. let say if the value in Length is "F", "M" and "I", then swap the data with Sex. I really have no idea how to swap the data between these two column based on the value in length.

swap_if <-function (ds$Sex, ds$Length, missing = NA){
  a <- ds$Length == "M"
  b <- ds$Length == "F"
  c <- ds$Length == "I"}

swap_if(ds$Sex, ds$Length)

but this cannot work. can someone help me with this. thank you so much

After that, I need to remove the invalid values. the fifth row first column has an invalid values as the value of Sex must be 'F', 'M' and 'I'.

#remove wrong format /invalid data in sex column
ds <- ds[!str_detect(ds$Sex, '([0-9])'), ]
ds$Sex <- as.factor(ds$Sex)

^ this code can remove the row with invalid / wrong format values.

Upvotes: 1

Views: 233

Answers (3)

r2evans
r2evans

Reputation: 160407

Despite what I recommended (fixing the source of the problem), here's a method for trying to recover.

column <- grepl("^[.0-9]+$", dat[,1])
column
# [1] FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE
dat2 <- data.frame(Sex = dat[cbind(seq_len(nrow(dat)),1+column)], Length = dat[cbind(seq_len(nrow(dat)),2-column)])
dat2$Length <- as.numeric(dat2$Length)
dat2
#   Sex Length
# 1   M   12.2
# 2   F   14.1
# 3   M   14.6
# 4   F    9.0
# 5   L   12.1
# 6   M   10.1
# 7   F   11.0
# 8   M   11.9

A tidyverse option:

library(dplyr)
dat %>%
  mutate(
    swap    = grepl("[^.[:digit:]]", Length), 
    Length2 = if_else(swap, Sex, Length), 
    Sex2    = if_else(swap, Length, Sex)
  ) %>%
  transmute(
    Sex    = Sex2, 
    Length = as.numeric(Length2)
  )
#   Sex Length
# 1   M   12.2
# 2   F   14.1
# 3   M   14.6
# 4   F    9.0
# 5   L   12.1
# 6   M   10.1
# 7   F   11.0
# 8   M   11.9

Data:

dat <- structure(list(Sex = c("M", "F", "M", "9", "L", "10.1", "11", 
"M"), Length = c("12.2", "14.1", "14.6", "F", "12.1", "M", "F", 
"11.9")), class = "data.frame", row.names = c(NA, -8L))

Upvotes: 1

MarBlo
MarBlo

Reputation: 4524

Here is another solution which works with tidyverse. It creates a little helper column which will be deleted at the end.

library(tidyverse)

# create the inverse of `%in%`
`%!in%` = Negate(`%in%`)

df %>% 
  # create a helper column, will be deleted
  mutate(help = Sex) %>% 
  mutate(Sex = case_when(
    Sex  %!in% c('L', 'M', 'F') ~ Length,
    TRUE ~ Sex
  )) %>% 
  mutate(Length = case_when(
    Length %in% c('L', 'M', 'F') ~ help,
    TRUE ~ Length
  )) %>% 
  # delete helper column
  select(-help)
#>   Sex Length
#> 1   M   12.2
#> 2   F   14.1
#> 3   M   14.6
#> 4   F      9
#> 5   L   12.1
#> 6   M   10.1
#> 7   F     11
#> 8   M   11.9

Upvotes: 1

Todd Burus
Todd Burus

Reputation: 983

You can set up a for loop with an if-then to check if the character variable can safely be converted to a numeric. If so, it belongs in length.

sex = c('M','F', 'M', 9, 'L', 10.1, 11, 'M')
length = c(12.2, 14.1, 14.6, 'F', 12.1, 'M', 'F', 11.9)

df = data.frame(sex, length)
str(df)

df$sex.c = 0
df$length.c = 0

for (i in 1:length(df$sex)){
  if (!is.na(as.numeric(df$sex[i]))){
    df$sex.c[i] = df$length[i]
    df$length.c[i] = df$sex[i]
  } else {
    df$sex.c[i] = df$sex[i]
    df$length.c[i] = df$length[i]
  }
}

df$sex = df$sex.c
df$length = df$length.c

df = df[,1:2]

Note that this creates a NAs introduced by coercion warning. If that bothers you then you can suppress them by altering the for loop as follows:

for (i in 1:length(df$sex)){
  if (suppressWarnings(!is.na(as.numeric(df$sex[i])))){
    df$sex.c[i] = df$length[i]
    df$length.c[i] = df$sex[i]
  } else {
    df$sex.c[i] = df$sex[i]
    df$length.c[i] = df$length[i]
  }
}

Upvotes: 0

Related Questions