Reputation: 111
I have a df
looking like this:
Department ID Category Category.ID
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
df = data.frame(Department = rep(c(NA, 'Sales', 'Sales'), times = 3),
ID = rep(c(NA, 101, 101), times = 3),
Category.Department = rep(c(NA, 2, 2), times = 3),
Category.ID = rep(c(NA, 4, 4), times = 3), stringsAsFactors = FALSE)
And I would like to have an output like this, where in only one column I can have the Department
and ID
and in another one, the Category
. The NA
in each column it is important to separate the groups.
New.Col Category
NA NA
Sales 2
101 4
NA NA
Sales 2
101 4
NA NA
Sales 2
101 4
So far I tried with transpose
, sapply
and a function
but it has not worked as I expected. Any suggestions in base
?
Upvotes: 0
Views: 54
Reputation: 6441
Can't accept an accept without true expected output.
df$group <- rep(1:3, times = 3)
df2 <- reshape(df[df$group != 3,], direction = "long", varying = list(New.col = c(1,2), Category = c(3,4)),
idvar = "id", v.names = c("New.col", "Category"))
df3 <- df2[order(df2$id),]
df3[!(df3$time == 1 & df3$group == 1), c(3,4)]
New.col Category
1.2 <NA> NA
2.1 Sales 2
2.2 101 4
3.2 <NA> NA
4.1 Sales 2
4.2 101 4
5.2 <NA> NA
6.1 Sales 2
6.2 101 4
Upvotes: 1
Reputation: 51592
Here is a different approach than casting to long format, which relies in coalesce
. In addition, I created a group variable and removed the NA
rows as they will not serve a purpose in your analysis, i.e.
library(tidyverse)
df %>%
group_by(grp = cumsum(rowSums(is.na(.)) == ncol(.))) %>%
mutate_at(vars(contains('ID')), funs(lag)) %>%
mutate_at(vars(contains('Department')), funs(lead)) %>%
mutate(new.col = coalesce(Department, as.character(ID)),
category = coalesce(Category.Department, Category.ID)) %>%
select(grp, new.col, category) %>%
distinct()
which gives,
# A tibble: 6 x 3 # Groups: grp [3] grp new.col category <int> <chr> <dbl> 1 1 Sales 2 2 1 101 4 3 2 Sales 2 4 2 101 4 5 3 Sales 2 6 3 101 4
Upvotes: 0