Reputation: 33
I have a question regarding pivoting. If I have a dataset which I scrape from the web and only gave me all the data in column, how can I pivot the data to their specific column: example
Column |
---|
Date |
Number |
Additional Number |
2022-08-10 |
2,4,5,6 |
7 |
2022-08-11 |
2,4,5,3 |
11 |
2022-08-12 |
2,4,5,8 |
14 |
The ideal output of the table would be:
Date | Number | Additional Number |
---|---|---|
2022-08-10 | 2,4,5,6 | 7 |
2022-08-11 | 2,4,5,3 | 11 |
2022-08-12 | 2,4,5,8 | 14 |
Upvotes: 1
Views: 70
Reputation: 39707
You can use matrix
to reshape the vector. In case the data is stored in a data.frame
of one column subset this columns to get a vector (e.g.DF$s
).
matrix(s[-1:-3], ncol=3, byrow=TRUE, dimnames = list(NULL, s[1:3]))
# Date Number Additional Number
#[1,] "2022-08-10" "2,4,5,6" "7"
#[2,] "2022-08-11" "2,4,5,3" "11"
#[3,] "2022-08-12" "2,4,5,8" "14"
If needed this could be converted to a data.frame
with as.data.frame
. If needed the type of the columns could be converted by using type.convert
. If needed the column Date could be converted to Date
.
. <- matrix(s[-1:-3], ncol=3, byrow=TRUE, dimnames = list(NULL, s[1:3]))
. <- as.data.frame(.)
. <- type.convert(., as.is=TRUE)
.$Date <- as.Date(.$Date)
str(.)
#'data.frame': 3 obs. of 3 variables:
# $ Date : Date, format: "2022-08-10" "2022-08-11" ...
# $ Number : chr "2,4,5,6" "2,4,5,3" "2,4,5,8"
# $ Additional Number: int 7 11 14
As in the comments a solution using tidyr::pivot_wider
was expected:
tidyr::pivot_wider(data.frame(v=s[-1:-3], cn=s[1:3], row=0:(length(s)-4)%/%3),
values_from = v, names_from = cn)[-1]
# Date Number `Additional Number`
# <chr> <chr> <chr>
#1 2022-08-10 2,4,5,6 7
#2 2022-08-11 2,4,5,3 11
#3 2022-08-12 2,4,5,8 14
Data
s <- c("Date","Number","Additional Number","2022-08-10","2,4,5,6","7",
"2022-08-11","2,4,5,3","11","2022-08-12","2,4,5,8","14")
Benchmark
bench::mark(check =FALSE,
matrix = matrix(s[-1:-3], ncol=3, byrow=TRUE, dimnames = list(NULL, s[1:3])),
pivot_wider = tidyr::pivot_wider(data.frame(v=s[-1:-3], cn=s[1:3], row=0:(length(s)-4)%/%3), values_from = v, names_from = cn)[-1],
"Anoushiravan R" = {df <- as.data.frame(t(do.call(cbind, split(s, rep(1:(length(s) %/% 3), each = 3)))))
names(df) <- as.character(unlist(df[1,]))
df <- df[-1, ]
row.names(df) <- NULL
df},
)
# expression min median `itr/sec` mem_a…¹ gc/se…² n_itr n_gc total…³
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:b> <dbl> <int> <dbl> <bch:t>
#1 matrix 1.69µs 2.08µs 397335. 0B 0 10000 0 25.2ms
#2 pivot_wider 5.78ms 6.29ms 156. 34.8KB 6.32 74 3 474.4ms
#3 Anoushiravan R 84.62µs 91.95µs 9446. 0B 12.8 4436 6 469.6ms
Using matrix
is in this case about 2500 times faster than using tidyr::pivot_wider
.
Upvotes: 1
Reputation: 21938
Here is another try in base R if your original data is a data frame with one column:
df <- as.data.frame(t(do.call(cbind, split(df, rep(1:(nrow(df) %/% 3), each = 3)))))
names(df) <- as.character(unlist(df[1,]))
df <- df[-1, ]
row.names(df) <- NULL
df
Date Number Additional Number
1 2022-08-10 2,4,5,6 7
2 2022-08-11 2,4,5,3 11
3 2022-08-12 2,4,5,8 14
Upvotes: 1
Reputation: 1023
I am putting a more "classic" solution to the problem below. This involves the modulo method which is useful to unpivot data like this.
library(tidyverse)
df <- tibble(x=letters)
df %>%
mutate(modulo=row_number()%%4, #pattern repeats every 4 entries so i put a 4
id=row_number()) %>% # row id
pivot_wider(names_from=modulo,values_from=x) %>%
fill(3:5,.direction="up") %>% #leave the first column and fill direction up
rename(col=2,col1=3,col2=4,col3=5) %>% #renamed the cols
filter(!is.na(col)) #remove empty from first column
Upvotes: 0