Reputation: 1
Currently my data frame consists of 1 column with 2326 rows. I want to transform my data frame to consist of 11 rows. More specifically, my current data frame look like
John Doe
7
45
42
978
3
6
8
9
0
11
Sally Jenkins
2
etc.
I want my data frame to look like:
John Doe 7 45 42 978 3 6 8 9 0 11
Sally Jenkins 2
Each person is a row and then their statistics are each a separate column in the row. Some people are missing a statistic so I can not separate based on the number of rows between each person's name.
I have tried using t()
as well as reshape(transform())
. Do you have any suggestions on where to go next?
Upvotes: 0
Views: 778
Reputation: 887851
1) One option is to create a grouping variable based on the presence of alphabets, summarise
by creating a list
output and use unnest_wider
to change that list
to a set of new columns
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
group_by(grp = cumsum(str_detect(col1, "[A-Za-z]"))) %>%
group_by(coln = first(col1), .add = TRUE) %>%
slice(-1) %>%
summarise(out = list(as.list(as.numeric(col1)))) %>%
unnest_wider(c(out)) %>%
ungroup %>%
select(-grp) %>%
rename_at(-1, ~ str_c('new_col', seq_along(.)))
# A tibble: 2 x 11
# coln new_col1 new_col2 new_col3 new_col4 new_col5 new_col6 new_col7 new_col8 new_col9 new_col10
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 John Doe 7 45 42 978 3 6 8 9 0 11
#2 Sally Jenkins 2 NA NA NA NA NA NA NA NA NA
2) Or using base R
(No packages used)
grp <- with(df1, ave(col1, cumsum(grepl('[A-Za-z]', col1)),
FUN = function(x) x[1]))
aggregate(values ~ ind, stack(split(as.numeric(df1$col1[duplicated(grp)]),
grp[duplicated(grp)])), FUN = I)
# ind values
#1 John Doe 7, 45, 42, 978, 3, 6, 8, 9, 0, 11
#2 Sally Jenkins 2
df1 <- structure(list(col1 = c("John Doe", "7", "45", "42", "978", "3",
"6", "8", "9", "0", "11", "Sally Jenkins", "2")), class = "data.frame",
row.names = c(NA,
-13L))
Upvotes: 1
Reputation: 270195
Using DF
defined reproducibly in the Note at the end, ensure that DF[[1]]
is character. Then use grep
to create a logical variable g
with one element per input row which is TRUE for each heading and FALSE otherwise. Then remove the headings and and split it up using the setNames
to set the names of the resulting list. Finally convert each list component to a numeric ts
series as such series can be cbind'ed even if they have different lengths. Now cbind
those together and transpose. No packages are used.
DF[[1]] <- as.character(DF[[1]])
g <- grepl("\\D", DF[[1]])
s <- setNames(split(DF[[1]][!g], cumsum(g)[!g]), DF[[1]][g])
t(do.call("cbind", lapply(s, function(x) ts(as.numeric(x)))))
giving:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
John Doe 7 45 42 978 3 6 8 9 0 11
Sally Jenkins 2 NA NA NA NA NA NA NA NA NA
The input in reproducible form is assumed to be:
Lines <- "John Doe
7
45
42
978
3
6
8
9
0
11
Sally Jenkins
2"
DF <- read.table(text = Lines, as.is = TRUE, sep = ";")
Upvotes: 0