Reputation: 523
I am student computer science student and novice R user.
Below is my Dataframe.
set.seed(1234)
df <- data.frame(
sex = rep(c('M','F'), 10),
profession = rep(c('Doctor','Lawyer'), each = 5),
pariticpant = rep(1:10, 2),
x = runif(20, 1, 10),
y = runif(20, 1, 10))
I want to find the differences in x and y for each day and for each participant. This will create a 10-row dataframe.
dday
will replace day
as the values will be the differences between the days.
dday sex profession participant dx dy
0-1 M Doctor 1 5.22 1.26
.
.
.
Would there be a proper way in R to perform this function?
Upvotes: 0
Views: 62
Reputation: 1421
You could also do it simply so
set.seed (1)
df <- data.frame(
day = rep (c(0,1),c(10,10)),
sex = rep(c('M','F'), 10),
profession = rep(c('Doctor','Lawyer'), each = 5),
participant = rep(1:10, 2),
x = runif(20, 1, 10),
y = runif(20, 1, 10))
Now we need to aggregrate by sex, profession and participant and then write a function that returns two columns with the difference of x and of y. Remember that a function in R return the last value calculated (in this example the data frame at the end).
ddply(df, c("sex", "profession", "participant"),
function(dat) {
ddx = 2*dat$x[[1]]-dat$x[[2]]
ddy = 2*dat$y[[1]]-dat$y[[2]]
data.frame (dx = ddx, dy = ddy)
})
Output is (not reordered)
sex profession participant dx dy
1 F Doctor 2 3.9572263 -0.9337529
2 F Doctor 4 -0.6294785 3.6342897
3 F Lawyer 6 1.6292118 -1.7344123
4 F Lawyer 8 0.7850676 1.2878669
5 F Lawyer 10 2.1418901 0.3098424
6 M Doctor 1 -3.1910030 1.8730386
7 M Doctor 3 -4.1488559 5.5640663
8 M Doctor 5 0.9190749 -0.2446371
9 M Lawyer 7 -3.2924210 5.1612642
10 M Lawyer 9 0.0743912 -5.4104425
Hope this help you. I find the ddply function as it is written easy to understand.
Upvotes: 0
Reputation: 4367
It appears that the day column is missing from the data.frame, but included in the picture
library(dplyr)
set.seed(1234)
df <- data.frame(day = rep(c(0, 1), each = 10),
sex = rep(c('M', 'F'), 10),
profession = rep(c('Doctor', 'Lawyer'), each = 5),
pariticpant = rep(1:10, 2),
x = runif(20, 1, 10),
y = runif(20, 1, 10))
df %>%
group_by(pariticpant) %>%
mutate(day = paste0(lag(day), "-", day), dx = x - lag(x), dy = y - lag(y)) %>%
select(-x, -y) %>%
filter(!is.na(dx))
Source: local data frame [10 x 8]
Groups: pariticpant [10]
day sex profession pariticpant dx dy
<chr> <fctr> <fctr> <int> <dbl> <dbl>
1 0-1 M Doctor 1 5.2189909 1.2553112
2 0-1 F Doctor 2 -0.6959211 -0.3375603
3 0-1 M Doctor 3 -2.9388703 1.3106358
4 0-1 F Doctor 4 2.7004864 4.2057986
5 0-1 M Doctor 5 -5.1173959 -0.3393300
6 0-1 F Lawyer 6 1.7728652 -0.4583513
7 0-1 M Lawyer 7 2.4905478 -2.9200456
8 0-1 F Lawyer 8 0.3084325 -5.9026351
9 0-1 M Lawyer 9 -4.3142487 1.4472483
10 0-1 F Lawyer 10 -2.5382271 6.8542387
Upvotes: 1