TBP
TBP

Reputation: 94

Converting a space and tabulated separated table string nested in data frame to wide data in R

I am struggling to convert a nested tabular and space delimited character string to wide data. My data looks something like this:

#Sample data
df <- tibble(id = 1:3, 
             text_table = c(NA,
                            "P1\tP2\tL1\n
                            M1\t15.09\t10.45\n
                            M2\t3040\t1959\n
                            M3\t0\t660\n",
                            NA))

df
# A tibble: 3 × 2
     id text_table                                                                                                                
  <int> <chr>                                                                                                                     
1     1  NA                                                                                                                       
2     2 "P1\tP2\tL1\n\n                            M1\t15.09\t10.45\n\n                            M2\t3040\t1959\n\n            …
3     3  NA    

Using read_table from readr package I can convert a single line to a text string, but I would like to apply this to entire column (text_table) and then convert the data to none nested wide data.

library(readr)
read_table(df$text_table[2])
# A tibble: 3 × 3
  P1        P2     L1
  <chr>  <dbl>  <dbl>
1 M1      15.1   10.4
2 M2    3040   1959  
3 M3       0    660

My desired output would look something like:

# A tibble: 3 × 7
     id P2_M1 P2_M2 P2_M3 L1_M1 L1_M2 L1_M3
  <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1  NA      NA    NA  NA      NA    NA
2     2  15.1  3040     0  10.4  1959   660
3     3  NA      NA    NA  NA      NA    NA

I don´t know how to apply the function read_tableto all the column elements. Using apply and lapply have not worked for me.

Upvotes: 1

Views: 210

Answers (1)

Mikko Marttila
Mikko Marttila

Reputation: 11878

I’d first write a helper function to transform a single table to the format you are after:

library(tidyverse)

df <- tibble(
  id = 1:3,
  text_table = c(
    NA,
    "P1\tP2\tL1\n
    M1\t15.09\t10.45\n
    M2\t3040\t1959\n
    M3\t0\t660\n",
    NA
  )
)

read_text_table <- function(x) {
  if (is.na(x)) return(tibble())
  
  read_table(I(x)) |>
    pivot_wider(
      names_from = P1,
      values_from = c(P2, L1)
    )
}

read_text_table(df$text_table[2])
#> # A tibble: 1 × 6
#>   P2_M1 P2_M2 P2_M3 L1_M1 L1_M2 L1_M3
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  15.1  3040     0  10.4  1959   660

Because the function is going to be applied to each element in the column, it needs to also gracefully handle an NA input. Here we just return an empty tibble in that case.

Then use rowwise() to apply it to each element in the text_table column:

tables <- df |>
  rowwise(id) |>
  summarise(
    read_text_table(text_table)
  )
#> `summarise()` has grouped output by 'id'. You can override using the `.groups`
#> argument.

tables
#> # A tibble: 1 × 7
#> # Groups:   id [1]
#>      id P2_M1 P2_M2 P2_M3 L1_M1 L1_M2 L1_M3
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     2  15.1  3040     0  10.4  1959   660

Because the read_text_table() helper returns a 0-row data frame when there is no table to read, we need a join to get the final result:

df |>
  select(id) |>
  left_join(tables, by = "id")
#> # A tibble: 3 × 7
#>      id P2_M1 P2_M2 P2_M3 L1_M1 L1_M2 L1_M3
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1  NA      NA    NA  NA      NA    NA
#> 2     2  15.1  3040     0  10.4  1959   660
#> 3     3  NA      NA    NA  NA      NA    NA

Upvotes: 1

Related Questions