Reputation: 94
I am struggling to convert a nested tabular and space delimited character string to wide data. My data looks something like this:
#Sample data
df <- tibble(id = 1:3,
text_table = c(NA,
"P1\tP2\tL1\n
M1\t15.09\t10.45\n
M2\t3040\t1959\n
M3\t0\t660\n",
NA))
df
# A tibble: 3 × 2
id text_table
<int> <chr>
1 1 NA
2 2 "P1\tP2\tL1\n\n M1\t15.09\t10.45\n\n M2\t3040\t1959\n\n …
3 3 NA
Using read_table
from readr
package I can convert a single line to a text string, but I would like to apply this to entire column (text_table
) and then convert the data to none nested wide data.
library(readr)
read_table(df$text_table[2])
# A tibble: 3 × 3
P1 P2 L1
<chr> <dbl> <dbl>
1 M1 15.1 10.4
2 M2 3040 1959
3 M3 0 660
My desired output would look something like:
# A tibble: 3 × 7
id P2_M1 P2_M2 P2_M3 L1_M1 L1_M2 L1_M3
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 NA NA NA NA NA NA
2 2 15.1 3040 0 10.4 1959 660
3 3 NA NA NA NA NA NA
I don´t know how to apply the function read_table
to all the column elements. Using apply
and lapply
have not worked for me.
Upvotes: 1
Views: 210
Reputation: 11878
I’d first write a helper function to transform a single table to the format you are after:
library(tidyverse)
df <- tibble(
id = 1:3,
text_table = c(
NA,
"P1\tP2\tL1\n
M1\t15.09\t10.45\n
M2\t3040\t1959\n
M3\t0\t660\n",
NA
)
)
read_text_table <- function(x) {
if (is.na(x)) return(tibble())
read_table(I(x)) |>
pivot_wider(
names_from = P1,
values_from = c(P2, L1)
)
}
read_text_table(df$text_table[2])
#> # A tibble: 1 × 6
#> P2_M1 P2_M2 P2_M3 L1_M1 L1_M2 L1_M3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 15.1 3040 0 10.4 1959 660
Because the function is going to be applied to each element in the column,
it needs to also gracefully handle an NA
input. Here we just return an
empty tibble in that case.
Then use rowwise()
to apply it to each element in the text_table
column:
tables <- df |>
rowwise(id) |>
summarise(
read_text_table(text_table)
)
#> `summarise()` has grouped output by 'id'. You can override using the `.groups`
#> argument.
tables
#> # A tibble: 1 × 7
#> # Groups: id [1]
#> id P2_M1 P2_M2 P2_M3 L1_M1 L1_M2 L1_M3
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 15.1 3040 0 10.4 1959 660
Because the read_text_table()
helper returns a 0-row data frame when there
is no table to read, we need a join to get the final result:
df |>
select(id) |>
left_join(tables, by = "id")
#> # A tibble: 3 × 7
#> id P2_M1 P2_M2 P2_M3 L1_M1 L1_M2 L1_M3
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 NA NA NA NA NA NA
#> 2 2 15.1 3040 0 10.4 1959 660
#> 3 3 NA NA NA NA NA NA
Upvotes: 1