Reputation: 665

Restructure data based on row numbers in R

I am having troubles restructuring the data as I need to. My df looks like this:

id <- (1:20)
author <- c("A","A","A","A","A","B","B","B","A","A","A","B","B","B","B"
            ,"B","B","B","A","A")
df <- data.frame(id, author)

> print(df)

   id author
1   1      A
2   2      A
3   3      A
4   4      A
5   5      A
6   6      B
7   7      B
8   8      B
9   9      A
10 10      A
11 11      A
12 12      B
13 13      B
14 14      B
15 15      B
16 16      B
17 17      B
18 18      B
19 19      A
20 20      A

And I'm trying to get a data structure where the columns are the authors and the rwos indicate the first and last id values of each sequence of A or B values. So in this case the first row with author A is id = 1, and the last one of that series is id 5, and so forth. Something like this:

A <- c(1, 5, 9, 11, 19,20)
B <- c(6, 8, 12, 18, NA, NA)
df.desired <- data.frame(A, B)
print(df.desired)
   A  B
1  1  6
2  5  8
3  9 12
4 11 18
5 19 NA
6 20 NA

Any ideas? Thanks a lot!

Upvotes: 3

Answers (3)

chinsoon12

Reputation: 25225

An option using data.table:

library(data.table)
dcast(
    setDT(df)[, ri := rleid(author)][, id[c(1L, .N)], .(author, ri)],
    rowid(author) ~ author, value.var="V1")

output:

   author  A  B
1:      1  1  6
2:      2  5  8
3:      3  9 12
4:      4 11 18
5:      5 19 NA
6:      6 20 NA

If there is a possibility of an author having a single row, you will need unique(c(1L, .N))

Upvotes: 1

Ronak Shah

Reputation: 388907

We can create groups using data.table rleid, select 1st and last row in each group and get data in wide format.

library(dplyr)

df %>%
  group_by(grp = data.table::rleid(author)) %>%
  slice(1L, n()) %>%
  group_by(author) %>%
  mutate(grp = row_number()) %>%
  tidyr::pivot_wider(names_from = author, values_from = id) %>%
  select(-grp)

# A tibble: 6 x 2
#      A     B
#  <int> <int>
#1     1     6
#2     5     8
#3     9    12
#4    11    18
#5    19    NA
#6    20    NA

For the updated request in comments we can do :

df %>%
  group_by(grp = data.table::rleid(author)) %>%
  slice(1L, n()) %>%
  mutate(author = row_number()) %>%
  tidyr::pivot_wider(names_from = row, values_from = id) %>%
  ungroup %>%
  select(-grp)

# A tibble: 5 x 2
#    `1`   `2`
#  <int> <int>
#1     1     5
#2     6     8
#3     9    11
#4    12    18
#5    19    20

Upvotes: 3

ThomasIsCoding

Reputation: 101247

Here is a base R option

z <- rle(df$author)
lst <- split(df,findInterval(1:nrow(df),cumsum(z$lengths), left.open = TRUE))
u <- lapply(lst,function(v) range(v$id))
idx <- split(seq_along(z$values),z$values)
x <- lapply(idx,function(v) unlist(u[v],use.names = FALSE))
df.desired <- as.data.frame(lapply(x,`length<-`,max(lengths(x))))

which gives

> df.desired
   A  B
1  1  6
2  5  8
3  9 12
4 11 18
5 19 NA
6 20 NA

Upvotes: 1

Restructure data based on row numbers in R

Answers (3)

Related Questions