Reputation: 1092
I have a dataframe like this:
df <- data.frame(theme1=c("hello",NA,NA,NA), theme2=c(NA,"world",NA,NA), theme3=c(NA,NA,"good_morning",NA), theme4=c(NA,NA,NA,"good_evening"))
theme1 theme2 theme3 theme4
1 hello NA NA NA
2 NA world NA NA
3 NA NA good_morning NA
4 NA NA NA good_evening
Now i want to obtain one column with preserving the row order:
**Theme_merged**
hello
world
good_morning
good_evening
Tries:
merge_themes <- data.frame(cbind(mycol = na.omit(unlist(data2_tst[18:23]))), stringsAsFactors = F)
The above code works but does not preserve the row order so when i want to place the vector back to the original dataframe it does not match anymore.
Real data:
dput(head(data2_tst[18:23], n = 50))
structure(list(Theme1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, "%Bedrukken%", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, "%Bedrukken%", NA, NA, NA, NA, NA, NA, NA, NA), Theme2 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "%Nieuwste|Nieuwe|201[6:7]%",
"%Nieuwste|Nieuwe|201[6:7]%", "%Nieuwste|Nieuwe|201[6:7]%", NA,
NA, NA, NA, NA, "%Nieuwste|Nieuwe|201[6:7]%", "%Nieuwste|Nieuwe|201[6:7]%",
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "%Nieuwste|Nieuwe|201[6:7]%",
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "%Nieuwste|Nieuwe|201[6:7]%",
"%Nieuwste|Nieuwe|201[6:7]%"), Theme3 = c("%Nodig%", NA, "%Nodig%",
"%Nodig%", "%Nodig%", NA, NA, "%Nodig%", NA, "%Nodig%", NA, NA,
NA, NA, "%Nodig%", "%Nodig%", "%Nodig%", NA, NA, NA, NA, NA,
NA, "%Nodig%", "%Nodig%", NA, NA, "%Nodig%", NA, "%Nodig%", "%Nodig%",
"%Nodig%", NA, "%Nodig%", "%Nodig%", "%Nodig%", NA, NA, NA, "%Nodig%",
"%Nodig%", NA, "%Nodig%", NA, "%Nodig%", "%Nodig%", NA, "%Nodig%",
NA, NA), Theme4 = c(NA, "%Kopen%", NA, NA, NA, "%Kopen%", "%Kopen%",
NA, "%Kopen%", NA, NA, NA, NA, NA, NA, NA, NA, "%Kopen%", "%Kopen%",
NA, NA, "%Kopen%", "%Kopen%", NA, NA, "%Kopen%", "%Kopen%", NA,
"%Kopen%", NA, NA, NA, NA, NA, NA, NA, "%Kopen%", "%Kopen%",
"%Kopen%", NA, NA, NA, NA, "%Kopen%", NA, NA, "%Kopen%", NA,
NA, NA), Theme5 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), Theme6 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_)), .Names = c("Theme1",
"Theme2", "Theme3", "Theme4", "Theme5", "Theme6"), row.names = 3:52, class = "data.frame")
Upvotes: 2
Views: 218
Reputation: 1378
This should work with your data:
new_df = c(as.matrix(df))
This line first converts the df
to a matrix and binds all the columns in one vector with c()
.
new_df <- new_df[!is.na(new_df)]
And now we keep only the non-NA
entries. If you want you can convert it back to a dataframe:
new_df <- data.frame(new_df);names(new_df) <- "Themes"
Upvotes: 0
Reputation: 18602
In SQL this would be the COALESCE
function:
apply(df, 1, function(r) c(na.omit(r), NA)[1])
# [1] "hello" "world" "good_morning" "good_evening"
df <- data.frame(
theme1=c("hello",NA,NA,NA),
theme2=c(NA,"world",NA,NA),
theme3=c(NA,NA,"good_morning",NA),
theme4=c(NA,NA,NA,"good_evening"),
stringsAsFactors = FALSE
)
On your example data na.omit(unlist(df2, use.names = FALSE))
will work fine, but it will fail if there is a row of only NA
values:
df2 <- data.frame(
theme1=c("hello",NA,NA,NA,NA),
theme2=c(NA,"world",NA,NA,NA),
theme3=c(NA,NA,"good_morning",NA,NA),
theme4=c(NA,NA,NA,"good_evening",NA),
theme5=c(NA_character_,NA_character_,NA_character_,
NA_character_,NA_character_),
stringsAsFactors = FALSE
)
df2$X <- na.omit(unlist(df2, use.names = FALSE))
# Error in `$<-.data.frame`(`*tmp*`, "X", value = c("hello", "world", "good_morning", :
# replacement has 4 rows, data has 5
df2$X <- apply(df2, 1, function(r) c(na.omit(r), NA)[1])
# theme1 theme2 theme3 theme4 theme5 X
# 1 hello <NA> <NA> <NA> <NA> hello
# 2 <NA> world <NA> <NA> <NA> world
# 3 <NA> <NA> good_morning <NA> <NA> good_morning
# 4 <NA> <NA> <NA> good_evening <NA> good_evening
# 5 <NA> <NA> <NA> <NA> <NA> <NA>
Another option could be df2$X <- df2[cbind(1:nrow(df2), max.col(!is.na(df2)))]
Upvotes: 1
Reputation: 71
Version 0.5.0 of dplyr introduced the coalesce function:
This version of dplyr gains a number of vector functions inspired by SQL. Two functions make it a little easier to eliminate or generate missing values:
Given a set of vectors, coalesce() finds the first non-missing value in each position.
To apply this to the sample data frame you can use:
df <- mutate_all(df, .funs = as.character)
df$merged <- with(df, coalesce(theme1, theme2, theme3, theme4))
I found it necessary to convert from factors to character to avoid an 'invalid factor levels' error.
On your real data no conversion is necessary:
df$merged <- with(df, coalesce(Theme1, Theme2, Theme3, Theme4, Theme5, Theme6)
Upvotes: 1
Reputation: 5008
Here's a tidyverse solution (uses dplyr
and tidyr
or just tidyverse
)
library(tidyverse)
> df <- df %>%
gather("theme", "theme_merged", 1:4) %>%
filter(!is.na(theme_merged)) %>%
select(theme_merged)
> df
theme_merged
1 hello
2 world
3 good_morning
4 good_evening
Upvotes: 0