Matt Sabia
Matt Sabia

Reputation: 11

Copying data in R dataframe from one cell to another based on contents of another cell

Here's some example code to help everyone understand what I am trying to do here:

ID  PicName  Rating  Correct
1   flower   4       
1   Answer           1
2   cat      5       
2   Answer           1

What I'm trying to do is link up the "Answer" rows with the corresponding data in the row above them, then delete those "Answer" rows. So namely, I need the contents of the "Correct" column on "Answer" rows moved to the cell in the same column of the previous row. My instinct on this problem was initially to try to index those answer rows and search through the dataframe this way, but I'm not sure how to implement a command to take the data out of these specific cells and move it to other specific cells. Any insight on this problem would be greatly appreciated, thanks!

Upvotes: 1

Views: 1309

Answers (2)

IceCreamToucan
IceCreamToucan

Reputation: 28685

Using the data.table package you can go through each ID, select the first value of PicName and Rating, and the second value of Correct.

library(data.table)
setDT(df) # give df class "data.table"
df[, .( PicName = PicName[1] 
      , Rating  = Rating[1]
      , Correct = Correct[2])
   , by = ID]

Returns

   ID PicName Rating Correct
1:  1  flower      4       1
2:  2     cat      5       1

Another option is to merge the Answer rows with the others (gives same result)

setkey(df, ID)
merge(df[PicName != 'Answer', -'Correct']
      , df[PicName == 'Answer', .(ID, Correct)])

Data Used:

df <- fread( 
"ID  PicName  Rating  Correct
1   flower   4       NA
1   Answer   NA        1
2   cat      5       NA
2   Answer  NA         1")

===================

If you have more than one pair per ID group you can use the below code. This picks out rows where PicName == 'Answer' is FALSE, uses those rows for PicName and Rating columns, then uses the other rows for the Correct column.

library(data.table)
library(magrittr)
setDT(df) # give df class "data.table"

df[, (PicName == 'Answer') 
      %>% {.(PicName  = PicName[!.]
            , Rating  = Rating[!.]
            , Correct = Correct[.])}
   , by = ID]

Upvotes: 1

JanLauGe
JanLauGe

Reputation: 2335

@Renu's approach is great, however, I prefer dplyr / tidyverse:

df %>%
  group_by(ID) %>%
  summarise(
    PicName = first(PicName),
    Rating = first(Rating),
    Correct = nth(Correct, 2))

returns:

# A tibble: 2 x 4
     ID PicName Rating Correct
  <int> <fct>    <int>   <int>
1     1 flower       4       1
2     2 cat          5       1

Pretty much the same thing though.

Upvotes: 1

Related Questions