KLM117
KLM117

Reputation: 467

Create new rows based on values in a column and assign these new rows a new value?

I have a data frame with 732 unique observations of 25 variables. What I would like to do is, assuming the reference and joining MeSH ID are not the same, create new rows below where I move the reference ID into the same column as the Joining MeSH ID, deleting the reference MeSH ID column and assigning a LinScore of 1 to the new row containing the reference MeSH ID. So if a row in the table currently looks something like this:

Disease Joining.Mesh.ID Company MoA Reference.MeSH.ID LinScore
Acute Myeloid Leukemia D015470 GSK VGF-B agonist D007951 0.9625

I want the final product to look something like this:

Disease Mesh ID Company Mode of Action LinScore
Acute Myeloid Leukemia D015470 GSK VGF-B agonist 0.9625
Acute Myeloid Leukemia D007951 GSK VGF-B agonist 1

I've asked a previous question that relates to this, and have some code which can create the new rows using pivot_longer() in dplyr, but I don't know how to assign the LinScore of 1 to these new rows.

As always, I'd be grateful for any suggestions!

Upvotes: 0

Views: 463

Answers (1)

IceCreamToucan
IceCreamToucan

Reputation: 28675

You can pivot_longer the columns that contain ID, then use if_else to set the LinScore equal to 1 for reference rows

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

df <- structure(list(Disease = "Acute Myeloid Leukemia", Joining.Mesh.ID = "D015470", 
                     Company = "GSK", MoA = "VGF-B agonist", Reference.MeSH.ID = "D007951", 
                     LinScore = 0.9625), row.names = c(NA, -1L), class = "data.frame")

df %>% 
  pivot_longer(cols = ends_with('.ID'),
                values_to = 'mesh_id') %>% 
  relocate(mesh_id, .after = Disease) %>%
  mutate(LinScore = if_else(grepl('Reference', name), 1, LinScore)) %>% 
  select(-name) 
#> # A tibble: 2 x 5
#>   Disease                mesh_id Company MoA           LinScore
#>   <chr>                  <chr>   <chr>   <chr>            <dbl>
#> 1 Acute Myeloid Leukemia D015470 GSK     VGF-B agonist    0.962
#> 2 Acute Myeloid Leukemia D007951 GSK     VGF-B agonist    1

Created on 2021-07-13 by the reprex package (v2.0.0)

Upvotes: 1

Related Questions