Reputation: 33
I want to multiply a fixed number, randomly in 50% of the columns in my data frame and keep other without change.
my code only holds up to 50% of the data randomly.
I used :
head(df1)
V1 V2 V3
1 0.034935 0.034935 -0.006482
2 0.034935 0.043194 0.012351
3 0.043194 0.043194 0.012351
df2<- df1[,sample(1:ncol(df1), 0.5*ncol(df1))]
Upvotes: 1
Views: 79
Reputation: 647
I think the problem is here:
sample(1:ncol(df1), 0.5*ncol(df1))
If you deliver non-integer values to sample's size parameter, the part after the coma seems to be cut of (floored).
try
length(sample(1:3, 1,2)) # result: 1
length(sample(1:3, 1.4)) # result: 1
length(sample(1:3, 1.6)) # result: 1
length(sample(1:3, 1.8)) # result: 1
length(sample(1:3, 2.99)) # result: 2
so this
0.5*ncol(df1)
will underestimate the 50% of all columns rule, since in case of uneven NCOL the number of sampled elements is floored.
You could try this as a simple workaround:
df_test = data.frame(A = 1:5, B = 1:5, C = 1:5)
df_test
selecter = sample(c(TRUE, FALSE), NCOL(df_test), replace = T)
factor = 2
df_test[selecter] = df_test[selecter] * factor
Instead of selecting 50% of all Cols, this approach selects every single col with a chance of 50% (which on the long-run should come pretty close). The drawback of this method is, that you can have runs with all columns selected or none.
Upvotes: 1
Reputation: 26343
Try
df1 <- iris[1:3, 1:4]
df1
# Sepal.Length Sepal.Width Petal.Length Petal.Width
#1 5.1 3.5 1.4 0.2
#2 4.9 3.0 1.4 0.2
#3 4.7 3.2 1.3 0.2
Sample from the columns - don't forget to set a seed
set.seed(42)
cols <- sample(1:ncol(df1), 0.5*ncol(df1)) # columns to multiply
other_cols <- setdiff(1:ncol(df1), cols) # other columns
Do the multiplication and combine the result with the columns that were not multiplied
number <- 2
df2 <- cbind(df1[cols] * number,
df1[other_cols])[names(df1)]
The part [names(df1)]
at the end arranges the columns of df2
in the original order.
Result
df2
# Sepal.Length Sepal.Width Petal.Length Petal.Width
#1 5.1 3.5 2.8 0.4
#2 4.9 3.0 2.8 0.4
#3 4.7 3.2 2.6 0.4
Upvotes: 1