LEE
LEE

Reputation: 328

How to do the fuzzy matching in r?

I have some data like below,

plot    Tsoil_11 Tsoil_12 Tsoil_13 Tsoil_21 Tsoil_22  SWC_11   SWC_12  SWC_13  SWC_21  SWC_22 
11        1        2         3       4         5       6       7       8       9       10
11b      11       12        13      14       15       16       17      18     19       20
11c      21       22        23      24       25       26       27      28     29       30
12       31       32        33      34       35        36      37      38     39       40
13       41       42        43      44       45        46       47      48     49       50
21       51       52        53      54       55        56       57      58     59       60
21b      61       62        63      64       65        66       67      68     69       70
22       71       72        73      74       75        76       77      78     79       80
...

My want to add two columns of Tsoil and SWC. The ideal data should be like below,

plot      Tsoil  SWC
11          1     6
11b        11    16
11c        21    26
12         32    12
13         43    48
21         54    59
21b        64    69
22         75    80
...

How to realize this in R?

Upvotes: 1

Views: 179

Answers (2)

Ian Campbell
Ian Campbell

Reputation: 24810

Here's a data.table approach:

We can use melt to change the data into long form. Then we can split the variables on _ using tstrsplit. Then we can filter for rows where the plot matches the variable. Finally, we can use dcast to go back to wide form.

library(data.table)
dcast(melt(setDT(data),id.var = "plot")[
  ,c("variable","obs") := tstrsplit(variable,"_")][
  gsub("\\D","",plot) == obs,.(plot,variable,value)]
  ,plot ~ variable)
   plot SWC Tsoil
1:   11   6     1
2:  11b  16    11
3:  11c  26    21
4:   12  37    32
5:   13  48    43
6:   21  59    54
7:  21b  69    64
8:   22  80    75

Upvotes: 1

akrun
akrun

Reputation: 887213

We can use pivot_longer and specify the names_sep as _, then filter values based on the 'ind' column comparison (suffix of column names) with the number only values from 'plot'

library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
  pivot_longer(cols = -plot, names_to = c( '.value', 'ind'), names_sep="_") %>%
  filter(str_remove(plot, "\\D+") == ind) %>%  
  #or use str_detect`
  #filter(str_detect(plot, ind)) %>%
  select(-ind)  
# A tibble: 8 x 3
# plot  Tsoil   SWC
#  <chr> <dbl> <dbl>
#1 11        1     6
#2 11b      11    16
#3 11c      21    26
#4 12       32    37
#5 13       43    48
#6 21       54    59
#7 21b      64    69
#8 22       75    80

data

df1 <- structure(list(plot = c("11", "11b", "11c", "12", "13", "21", 
"21b", "22"), Tsoil_11 = c(1L, 11L, 21L, 31L, 41L, 51L, 61L, 
71L), Tsoil_12 = c(2L, 12L, 22L, 32L, 42L, 52L, 62L, 72L), Tsoil_13 = c(3L, 
13L, 23L, 33L, 43L, 53L, 63L, 73L), Tsoil_21 = c(4L, 14L, 24L, 
34L, 44L, 54L, 64L, 74L), Tsoil_22 = c(5L, 15L, 25L, 35L, 45L, 
55L, 65L, 75L), SWC_11 = c(6L, 16L, 26L, 36L, 46L, 56L, 66L, 
76L), SWC_12 = c(7L, 17L, 27L, 37L, 47L, 57L, 67L, 77L), SWC_13 = c(8L, 
18L, 28L, 38L, 48L, 58L, 68L, 78L), SWC_21 = c(9L, 19L, 29L, 
39L, 49L, 59L, 69L, 79L), SWC_22 = c(10L, 20L, 30L, 40L, 50L, 
60L, 70L, 80L)), class = "data.frame", row.names = c(NA, -8L))

Upvotes: 1

Related Questions