Reputation: 477
So I have a data frame like this, with names of species present in three countries, with some NAs in countries where a certain species does not exist:
Country_A | Country_B | Country_C
-----------------------------------------------------
Tilapia guineensis | NA | Tilapia guineensisi
Tilapia zillii | Tilapia zillii | Tilapia zillii
NA | Fundulus rubrifrons | Fundulus rubrifrons
Eutrigla gurnardus | Eutrigla gurnardus | NA
Sprattus sprattus | NA | NA
What I'm trying to do is basically check if a species is present in either one, two or three countries and make a data frame to look something like this:
Species name | Country_A | Country_B | Country_C
---------------------------------------------------------------------
Tilapia guineensis | present | not_present | present
Tilapia zillii | present | present | present
Fundulus rubrifrons | not_present | present | present
Eutrigla gurnardus | present | present | not_present
Sprattus sprattus | present | not_present | not_present
I think maybe using the spread function can be a way to go, or the ifelse function, but I don't really know how to implement it. Thank you so much for any answer
Upvotes: 2
Views: 43
Reputation: 193667
As mentioned in the comments, this can be approached with a melt
+ dcast
(or pivot_longer
+ pivot_wider
) solution. Here are two alternatives:
melt
+ dcast
library(data.table)
melt(as.data.table(df), measure = names(df), na.rm = TRUE, value.name = "Species_Name")[
, ind := "present"][
, dcast(.SD, Species_Name ~ variable, value.var = "ind", fill = "not_present")]
# Species_Name Country_A Country_B Country_C
# 1: Eutrigla gurnardus present present not_present
# 2: Fundulus rubrifrons not_present present present
# 3: Sprattus sprattus present not_present not_present
# 4: Tilapia guineensis present not_present present
# 5: Tilapia zillii present present present
pivot_longer
+ pivot_wider
library(dplyr)
library(tidyr)
df %>%
pivot_longer(everything(), values_drop_na = TRUE, values_to = "Species_Name") %>%
mutate(ind = "present") %>%
pivot_wider(names_from = name, values_from = ind, values_fill = "not_present")
# # A tibble: 5 x 4
# Species_Name Country_A Country_C Country_B
# <chr> <chr> <chr> <chr>
# 1 Tilapia guineensis present present not_present
# 2 Tilapia zillii present present present
# 3 Fundulus rubrifrons not_present present present
# 4 Eutrigla gurnardus present not_present present
# 5 Sprattus sprattus present not_present not_present
Upvotes: 1
Reputation: 887721
Here, is an option with is.na
. Loop over the columns of the dataset with lapply
, create a logical vector with is.na
, convert it to numeric index, replace the values with a vector of strings and bind them into the transmute
ed 'Species_name' column created with coalesce
lst1 <- lapply(df1, function(x) c("present", "not_present")[1 + is.na(x)])
library(dplyr)
df1 %>%
transmute(Species_name = coalesce(!!! .)) %>%
bind_cols(lst1)
-output
# Species_name Country_A Country_B Country_C
#1 Tilapia guineensis present not_present present
#2 Tilapia zillii present present present
#3 Fundulus rubrifrons not_present present present
#4 Eutrigla gurnardus present present not_present
#5 Sprattus sprattus present not_present not_present
Or if we want to do this solely on tidyverse
, an option using only dplyr
and more compactly
df1 %>%
mutate(Species_name = coalesce(!!! .),
across(starts_with('Country'),
~c("present", "not_present")[1 + is.na(.)]))
# Country_A Country_B Country_C Species_name
#1 present not_present present Tilapia guineensis
#2 present present present Tilapia zillii
#3 not_present present present Fundulus rubrifrons
#4 present present not_present Eutrigla gurnardus
#5 present not_present not_present Sprattus sprattus
df1 <- structure(list(Country_A = c("Tilapia guineensis", "Tilapia zillii",
NA, "Eutrigla gurnardus", "Sprattus sprattus"), Country_B = c(NA,
"Tilapia zillii", "Fundulus rubrifrons", "Eutrigla gurnardus",
NA), Country_C = c("Tilapia guineensisi", "Tilapia zillii",
"Fundulus rubrifrons",
NA, NA)), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 2
Reputation: 39613
Using some tidyverse
functions and reshaping data smartly you can get:
library(dplyr)
library(tidyr)
#Code
newdf <- df %>% pivot_longer(everything()) %>%
select(value) %>%
filter(!is.na(value)) %>%
filter(!duplicated(value)) %>%
left_join(
df %>% mutate(id=row_number()) %>%
pivot_longer(-id) %>%
filter(!is.na(value)) %>% mutate(Val='present')) %>%
pivot_wider(names_from = name,values_from=Val,values_fill='not_present',
names_sort=T) %>%
select(-id) %>% rename(Species=value)
Output:
# A tibble: 5 x 4
Species Country_A Country_B Country_C
<chr> <chr> <chr> <chr>
1 Tilapia guineensis present not_present present
2 Tilapia zillii present present present
3 Fundulus rubrifrons not_present present present
4 Eutrigla gurnardus present present not_present
5 Sprattus sprattus present not_present not_present
Some data used:
#Data
df <- structure(list(Country_A = c("Tilapia guineensis", "Tilapia zillii",
NA, "Eutrigla gurnardus", "Sprattus sprattus"), Country_B = c(NA,
"Tilapia zillii", "Fundulus rubrifrons", "Eutrigla gurnardus",
NA), Country_C = c("Tilapia guineensis", "Tilapia zillii", "Fundulus rubrifrons",
NA, NA)), row.names = c(NA, -5L), class = "data.frame")
Upvotes: 2