Nicholas Hayden
Nicholas Hayden

Reputation: 523

How to subtract grouped consequential data in R

I am student computer science student and novice R user.

Below is my Dataframe.

set.seed(1234)
df <- data.frame(
                  sex = rep(c('M','F'), 10),
                  profession = rep(c('Doctor','Lawyer'), each = 5),
                  pariticpant = rep(1:10, 2),
                  x = runif(20, 1, 10),
                  y = runif(20, 1, 10))

enter image description here

I want to find the differences in x and y for each day and for each participant. This will create a 10-row dataframe.

dday will replace day as the values will be the differences between the days.

dday sex profession participant dx   dy
0-1  M   Doctor     1           5.22 1.26
.
.
.

Would there be a proper way in R to perform this function?

Upvotes: 0

Views: 62

Answers (2)

Umberto
Umberto

Reputation: 1421

You could also do it simply so

set.seed (1)


df <- data.frame(
day = rep (c(0,1),c(10,10)),
sex = rep(c('M','F'), 10),
profession = rep(c('Doctor','Lawyer'), each = 5),
participant = rep(1:10, 2),
x = runif(20, 1, 10),
y = runif(20, 1, 10))

Now we need to aggregrate by sex, profession and participant and then write a function that returns two columns with the difference of x and of y. Remember that a function in R return the last value calculated (in this example the data frame at the end).

ddply(df, c("sex", "profession", "participant"), 
  function(dat) {
    ddx = 2*dat$x[[1]]-dat$x[[2]]
    ddy = 2*dat$y[[1]]-dat$y[[2]]
    data.frame (dx = ddx, dy = ddy)
    })

Output is (not reordered)

   sex profession participant         dx         dy
1    F     Doctor           2  3.9572263 -0.9337529
2    F     Doctor           4 -0.6294785  3.6342897
3    F     Lawyer           6  1.6292118 -1.7344123
4    F     Lawyer           8  0.7850676  1.2878669
5    F     Lawyer          10  2.1418901  0.3098424
6    M     Doctor           1 -3.1910030  1.8730386
7    M     Doctor           3 -4.1488559  5.5640663
8    M     Doctor           5  0.9190749 -0.2446371
9    M     Lawyer           7 -3.2924210  5.1612642
10   M     Lawyer           9  0.0743912 -5.4104425

Hope this help you. I find the ddply function as it is written easy to understand.

Upvotes: 0

manotheshark
manotheshark

Reputation: 4367

It appears that the day column is missing from the data.frame, but included in the picture

library(dplyr)

set.seed(1234)
df <- data.frame(day = rep(c(0, 1), each = 10),
             sex = rep(c('M', 'F'), 10),
             profession = rep(c('Doctor', 'Lawyer'), each = 5),
             pariticpant = rep(1:10, 2),
             x = runif(20, 1, 10),
             y = runif(20, 1, 10))

df %>%
  group_by(pariticpant) %>%
  mutate(day = paste0(lag(day), "-", day), dx = x - lag(x), dy = y - lag(y)) %>%
  select(-x, -y) %>%
  filter(!is.na(dx))

Source: local data frame [10 x 8]
Groups: pariticpant [10]

     day    sex profession pariticpant         dx         dy
   <chr> <fctr>     <fctr>       <int>      <dbl>      <dbl>
1    0-1      M     Doctor           1  5.2189909  1.2553112
2    0-1      F     Doctor           2 -0.6959211 -0.3375603
3    0-1      M     Doctor           3 -2.9388703  1.3106358
4    0-1      F     Doctor           4  2.7004864  4.2057986
5    0-1      M     Doctor           5 -5.1173959 -0.3393300
6    0-1      F     Lawyer           6  1.7728652 -0.4583513
7    0-1      M     Lawyer           7  2.4905478 -2.9200456
8    0-1      F     Lawyer           8  0.3084325 -5.9026351
9    0-1      M     Lawyer           9 -4.3142487  1.4472483
10   0-1      F     Lawyer          10 -2.5382271  6.8542387

Upvotes: 1

Related Questions