Reputation: 33
I have a dataframe that looks like this (but for every US county)
county | state | n_state_1 | n_state_2 | n_state_3 | n_state_4 |
---|---|---|---|---|---|
Autauga County | AL | NA | FL | NA | NA |
Baldwin County | AL | GA | NA | TN | NA |
Catron County | AL | FL | GA | NA | CA |
I want to move the non-missing values (FL,GA,TN etc.) to the first columns starting from n_state_1 and then delete the columns containing only missing values to get:
county | state | n_state_1 | n_state_2 | n_state_3 |
---|---|---|---|---|
Autauga County | AL | FL | NA | NA |
Baldwin County | AL | GA | TN | NA |
Catron County | AL | FL | GA | CA |
I am struggling with the first step. I thought about using the function distinct but it doesn't work because there are non-empty elements in each column.
Upvotes: 3
Views: 108
Reputation: 886938
Or another option with dapply
from collapse
and select
only columns with any
non-NA elements
library(collapse)
library(dplyr)
dapply(df1, MARGIN = 1, FUN = function(x) c(x[!is.na(x)], x[is.na(x)])) %>%
select(where(~ any(complete.cases(.))))
# A tibble: 3 x 5
county state n_state_1 n_state_2 n_state_3
<chr> <chr> <chr> <chr> <chr>
1 Autauga_County AL FL <NA> <NA>
2 Baldwin_County AL GA TN <NA>
3 Catron_County AL FL GA CA
Upvotes: 2
Reputation: 16978
You could use dplyr
and tidyr
:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(starts_with("n_state")) %>%
drop_na() %>%
group_by(county, state) %>%
mutate(name=row_number()) %>%
pivot_wider(names_prefix="n_state_")
which returns
county state n_state_1 n_state_2 n_state_3
<chr> <chr> <chr> <chr> <chr>
1 Autauga_County AL FL NA NA
2 Baldwin_County AL GA TN NA
3 Catron_County AL FL GA CA
What happened here?
pivot_longer
takes the n_state_{n}
-columns and collapses them into two columns: the name
-column contains the original column name (n_state_1
, n_state_2
etc), the value
-column contains the states (FL
, GA
or <NA>
in many cases).<NA>
entry. (Note: I use <NA>
to make clear it's an NA
-value).)county
and state
we add a rownumber. These numbers will be later used to create the new column names.pivot_wider
now takes these row numbers and prefixes them with n_state_
to get the new columns. The values are taken from the value
-column created in the second line of code. pivot_wider
fills the missing values with <NA>
-values (default behaviour).structure(list(county = c("Autauga_County", "Baldwin_County",
"Catron_County"), state = c("AL", "AL", "AL"), n_state_1 = c(NA,
"GA", "FL"), n_state_2 = c("FL", NA, "GA"), n_state_3 = c(NA,
"TN", NA), n_state_4 = c(NA, NA, "CA")), problems = structure(list(
row = 3L, col = "n_state_4", expected = "", actual = "embedded null",
file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame")), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -3L), spec = structure(list(
cols = list(county = structure(list(), class = c("collector_character",
"collector")), state = structure(list(), class = c("collector_character",
"collector")), n_state_1 = structure(list(), class = c("collector_character",
"collector")), n_state_2 = structure(list(), class = c("collector_character",
"collector")), n_state_3 = structure(list(), class = c("collector_character",
"collector")), n_state_4 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
Upvotes: 3