Reputation: 446
Im not sure if this is possible, I have a large dataframe which I'm trying to manipulate into a format to work with. Essentially what I want is any element in column 1 which contains text is to be appended to the end of the previous row which have an id number in column 1. What makes this complicated is that there can be numerous text elements between numerical IDs.
Example
df <- structure(list(V1 = structure(c(1L, 4L, 2L, 4L, 5L, 3L, 4L), .Label = c("1234","2345", "3456", "text1", "text2"), class = "factor"), V2 = structure(1:7, .Label = c("data1", "data2", "data3", "data4", "data5", "data6", "data7"), class = "factor")), class = "data.frame", row.names = c(NA,
-7L))
It appears like so
1234 data1
text1 data2
2345 data3
text1 data4
text2 data5
3456 data6
text1 data7
What I want is
1234 data1 text1 data2
2345 data3 text1 data4 text2 data5
3456 data6 text1 data7
Upvotes: 1
Views: 164
Reputation: 40171
One possibility involving dplyr
and tidyr
could be:
df %>%
mutate(grp = cumsum(!grepl("text", V1, fixed = TRUE))) %>%
gather(var, val, -grp) %>%
group_by(grp) %>%
mutate(var = paste0("V", row_number())) %>%
ungroup() %>%
spread(var, val) %>%
select(-grp)
V1 V2 V3 V4 V5 V6
<chr> <chr> <chr> <chr> <chr> <chr>
1 1234 text1 data1 data2 <NA> <NA>
2 2345 text1 text2 data3 data4 data5
3 3456 text1 data6 data7 <NA> <NA>
Or if you want to match your desired output exactly:
df %>%
group_by(grp = cumsum(!grepl("text", V1, fixed = TRUE))) %>%
mutate(grp2 = row_number()) %>%
ungroup() %>%
gather(var, val, -c(grp, grp2)) %>%
arrange(grp, grp2) %>%
group_by(grp) %>%
mutate(var = paste0("V", row_number())) %>%
ungroup() %>%
select(-grp2) %>%
spread(var, val) %>%
select(-grp)
V1 V2 V3 V4 V5 V6
<chr> <chr> <chr> <chr> <chr> <chr>
1 1234 data1 text1 data2 <NA> <NA>
2 2345 data3 text1 data4 text2 data5
3 3456 data6 text1 data7 <NA> <NA>
Upvotes: 2