Sans Soleil
Sans Soleil

Reputation: 43

R conditionally map a repeated vector as column to a data frame

I would like to substract a vector of means from the original values. I cannot figure out, how to map the corresponding conditions of the means and values. So far i tried it with arranging the values correctly, but even there i fail.

library("reshape")
require('plyr')
require("dplyr")

The dataframe:

    n  <-  as.factor(rep(c(1:16), times=2)) 
s  <-  as.factor(rep(c("ja","nein"), each=8, times=2))
b  <-  as.factor(rep(c("red", "green","blue", "pink"),times=8)) 
zahl <- runif(32)
df  <-  data.frame(n, s, b, zahl)

the means as a column:

df.mean <- melt(data.frame(cast(df, b~s, mean)), id=1, measured=2:3)

my wrong version:

df.final <- df%>%
  mutate(r=1:32,
         trial=rep(1:2, each=16))%>%
  #arrange(r,n,trial,s,b)%>%   # this does't arrange the "ja, nein" eaqual to the means
  mutate(mean.bs=rep(df.mean[,3], times=4),
         diff=zahl-mean.bs)

the results should be like:

    n    s     b zahl  trial mean.bs  diff
1   1   ja   red 0.49     1  0.8025 -0.3125
2   2   ja green 0.59     1  0.6200 -0.0300
3   3   ja  blue 0.97     1  0.3175  0.6525
4   4   ja  pink 0.04     1  0.5225 -0.4825
5   9 nein   red 0.x      1  0.4775  0.x
6  10 nein green 0.x      1  0.3975  0.x
7  11 nein  blue 0.x      1  0.5625  0.x
8  12 nein  pink 0.x      1  0.3925  0.x
9   5   ja   red 0.x      1  0.8025 -0.x   # here means repeat
10  6   ja green 0.x      1  0.6200 -0.x
...

And maybe there is a more precise way to do it? (with condition ...)

thank you!

Upvotes: 1

Views: 217

Answers (2)

Dominic Comtois
Dominic Comtois

Reputation: 10411

Ok I'm not 100% sure that's what you want to achieve (setting seed before using randomized data is a good idea), but try this (picking up after your df.mean <- ... line:

colnames(df.mean) <- c("b","s","mean.bs")
df$trial <- rep(1:2, each=16)
df2 <- merge(df, df.mean, by=c("b", "s"))
df2$diff <- df2$zahl - df2$mean.bs
df2 <- df2[order(df2$trial, df2$n),]
rownames(df2) <- NULL

head(df2)

      b  s n       zahl trial   mean.bs       diff
1   red ja 1 0.87370077     1 0.6972817  0.1764190
2 green ja 2 0.01389495     1 0.4272126 -0.4133177
3  blue ja 3 0.96772185     1 0.5276125  0.4401094
4  pink ja 4 0.80911187     1 0.3625441  0.4465678
5   red ja 5 0.47676424     1 0.6972817 -0.2205175
6 green ja 6 0.07390932     1 0.4272126 -0.3533033

Upvotes: 1

akrun
akrun

Reputation: 887168

We can get the difference within the mutate itself

library(dplyr)
df %>%
    group_by(b,s) %>% 
     mutate(mean.bs= mean(zahl), diff= zahl-mean.bs)

Upvotes: 1

Related Questions