Reputation: 477

How to remove the NAs of a data frame by changing the disposition of the data?

So I have a data frame like this, with names of species present in three countries, with some NAs in countries where a certain species does not exist:

     Country_A      |      Country_B      |    Country_C   
-----------------------------------------------------
Tilapia guineensis  |         NA          | Tilapia guineensisi
Tilapia zillii      | Tilapia zillii      | Tilapia zillii
       NA           | Fundulus rubrifrons |  Fundulus rubrifrons
Eutrigla gurnardus  | Eutrigla gurnardus  |       NA
Sprattus sprattus   |         NA          |       NA

What I'm trying to do is basically check if a species is present in either one, two or three countries and make a data frame to look something like this:

     Species name   |      Country_A     |    Country_B    | Country_C 
---------------------------------------------------------------------
Tilapia guineensis  |      present       |   not_present   | present
Tilapia zillii      |      present       |   present       | present
Fundulus rubrifrons |     not_present    |   present       | present 
Eutrigla gurnardus  |      present       |   present       | not_present
Sprattus sprattus   |      present       |   not_present   | not_present

I think maybe using the spread function can be a way to go, or the ifelse function, but I don't really know how to implement it. Thank you so much for any answer

Upvotes: 2

Answers (3)

A5C1D2H2I1M1N2O1R2T1

Reputation: 193667

As mentioned in the comments, this can be approached with a melt + dcast (or pivot_longer + pivot_wider) solution. Here are two alternatives:

`melt` + `dcast`

library(data.table)
melt(as.data.table(df), measure = names(df), na.rm = TRUE, value.name = "Species_Name")[
  , ind := "present"][
    , dcast(.SD, Species_Name ~ variable, value.var = "ind", fill = "not_present")]
#           Species_Name   Country_A   Country_B   Country_C
# 1:  Eutrigla gurnardus     present     present  not_present
# 2: Fundulus rubrifrons not_present     present     present
# 3:   Sprattus sprattus     present not_present not_present
# 4:  Tilapia guineensis     present not_present     present
# 5:      Tilapia zillii     present     present     present

`pivot_longer` + `pivot_wider`

library(dplyr)
library(tidyr)
df %>%
  pivot_longer(everything(), values_drop_na = TRUE, values_to = "Species_Name") %>%
  mutate(ind = "present") %>%
  pivot_wider(names_from = name, values_from = ind, values_fill = "not_present")
# # A tibble: 5 x 4
#   Species_Name        Country_A   Country_C   Country_B  
#   <chr>               <chr>       <chr>       <chr>      
# 1 Tilapia guineensis  present     present     not_present
# 2 Tilapia zillii      present     present     present    
# 3 Fundulus rubrifrons not_present present     present    
# 4 Eutrigla gurnardus  present     not_present present 
# 5 Sprattus sprattus   present     not_present not_present

Upvotes: 1

akrun

Reputation: 887721

Here, is an option with is.na. Loop over the columns of the dataset with lapply, create a logical vector with is.na, convert it to numeric index, replace the values with a vector of strings and bind them into the transmuteed 'Species_name' column created with coalesce

lst1 <- lapply(df1, function(x) c("present", "not_present")[1 + is.na(x)])

library(dplyr)
df1 %>%
    transmute(Species_name = coalesce(!!! .)) %>%
    bind_cols(lst1)

-output

#         Species_name   Country_A   Country_B   Country_C
#1  Tilapia guineensis     present not_present     present
#2      Tilapia zillii     present     present     present
#3 Fundulus rubrifrons not_present     present     present
#4  Eutrigla gurnardus     present     present not_present
#5   Sprattus sprattus     present not_present not_present

Or if we want to do this solely on tidyverse, an option using only dplyr and more compactly

df1 %>% 
     mutate(Species_name = coalesce(!!! .),
        across(starts_with('Country'), 
                  ~c("present", "not_present")[1 + is.na(.)]))
#    Country_A   Country_B   Country_C        Species_name
#1     present not_present     present  Tilapia guineensis
#2     present     present     present      Tilapia zillii
#3 not_present     present     present Fundulus rubrifrons
#4     present     present not_present  Eutrigla gurnardus
#5     present not_present not_present   Sprattus sprattus

data

df1 <- structure(list(Country_A = c("Tilapia guineensis", "Tilapia zillii", 
NA, "Eutrigla gurnardus", "Sprattus sprattus"), Country_B = c(NA, 
"Tilapia zillii", "Fundulus rubrifrons", "Eutrigla gurnardus", 
NA), Country_C = c("Tilapia guineensisi", "Tilapia zillii",
"Fundulus rubrifrons", 
NA, NA)), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 2

Duck

Reputation: 39613

Using some tidyverse functions and reshaping data smartly you can get:

library(dplyr)
library(tidyr)
#Code
newdf <- df %>% pivot_longer(everything()) %>%
  select(value) %>%
  filter(!is.na(value)) %>%
  filter(!duplicated(value)) %>%
  left_join(
    df %>% mutate(id=row_number()) %>%
      pivot_longer(-id) %>%
      filter(!is.na(value)) %>% mutate(Val='present')) %>%
  pivot_wider(names_from = name,values_from=Val,values_fill='not_present',
              names_sort=T) %>%
  select(-id) %>% rename(Species=value)

Output:

# A tibble: 5 x 4
  Species             Country_A   Country_B   Country_C  
  <chr>               <chr>       <chr>       <chr>      
1 Tilapia guineensis  present     not_present present    
2 Tilapia zillii      present     present     present    
3 Fundulus rubrifrons not_present present     present    
4 Eutrigla gurnardus  present     present     not_present
5 Sprattus sprattus   present     not_present not_present

Some data used:

#Data
df <- structure(list(Country_A = c("Tilapia guineensis", "Tilapia zillii", 
NA, "Eutrigla gurnardus", "Sprattus sprattus"), Country_B = c(NA, 
"Tilapia zillii", "Fundulus rubrifrons", "Eutrigla gurnardus", 
NA), Country_C = c("Tilapia guineensis", "Tilapia zillii", "Fundulus rubrifrons", 
NA, NA)), row.names = c(NA, -5L), class = "data.frame")

Upvotes: 2

How to remove the NAs of a data frame by changing the disposition of the data?

Answers (3)

melt + dcast

pivot_longer + pivot_wider

data

Related Questions

`melt` + `dcast`

`pivot_longer` + `pivot_wider`