fateme
fateme

Reputation: 33

Multiplying a fixed number randomly in % of the columns in one dataframe

I want to multiply a fixed number, randomly in 50% of the columns in my data frame and keep other without change.

my code only holds up to 50% of the data randomly.

I used :

head(df1)


 V1       V2        V3

1 0.034935 0.034935 -0.006482

2 0.034935 0.043194  0.012351

3 0.043194 0.043194  0.012351

 df2<- df1[,sample(1:ncol(df1), 0.5*ncol(df1))]

Upvotes: 1

Views: 79

Answers (2)

TinglTanglBob
TinglTanglBob

Reputation: 647

I think the problem is here:

sample(1:ncol(df1), 0.5*ncol(df1))

If you deliver non-integer values to sample's size parameter, the part after the coma seems to be cut of (floored).

try

length(sample(1:3, 1,2)) # result: 1
length(sample(1:3, 1.4)) # result: 1
length(sample(1:3, 1.6)) # result: 1
length(sample(1:3, 1.8)) # result: 1
length(sample(1:3, 2.99)) # result: 2

so this

0.5*ncol(df1)

will underestimate the 50% of all columns rule, since in case of uneven NCOL the number of sampled elements is floored.

You could try this as a simple workaround:

df_test = data.frame(A = 1:5, B = 1:5, C = 1:5)
df_test

selecter = sample(c(TRUE, FALSE), NCOL(df_test), replace = T)
factor = 2

df_test[selecter] = df_test[selecter] * factor

Instead of selecting 50% of all Cols, this approach selects every single col with a chance of 50% (which on the long-run should come pretty close). The drawback of this method is, that you can have runs with all columns selected or none.

Upvotes: 1

markus
markus

Reputation: 26343

Try

df1 <- iris[1:3, 1:4]
df1
#  Sepal.Length Sepal.Width Petal.Length Petal.Width
#1          5.1         3.5          1.4         0.2
#2          4.9         3.0          1.4         0.2
#3          4.7         3.2          1.3         0.2

Sample from the columns - don't forget to set a seed

set.seed(42)
cols <- sample(1:ncol(df1), 0.5*ncol(df1)) # columns to multiply
other_cols <- setdiff(1:ncol(df1), cols)   # other columns

Do the multiplication and combine the result with the columns that were not multiplied

number <- 2
df2 <- cbind(df1[cols] * number,
             df1[other_cols])[names(df1)]

The part [names(df1)] at the end arranges the columns of df2 in the original order.

Result

df2
#  Sepal.Length Sepal.Width Petal.Length Petal.Width
#1          5.1         3.5          2.8         0.4
#2          4.9         3.0          2.8         0.4
#3          4.7         3.2          2.6         0.4

Upvotes: 1

Related Questions