user976991
user976991

Reputation: 411

How to apply a function across columns of data.frame?

Let's say I have a data frame with numerical values like this

AA01.AVG_Beta AA02.AVG_Beta AA03.AVG_Beta AA04.AVG_Beta AA05.AVG_Beta
1     0.15851770    0.44264830    0.46662180    0.79579230   0.555430100
2     0.87148450    0.93462340    0.92591830    0.93812860   0.942683400
3     0.60907060    0.92463760    0.62698660    0.86852790   0.457659300
4     0.10728340    0.07848221    0.06340047    0.08589865   0.118239800
5     0.72353630    0.91198210    0.87339600    0.88050440   0.902925300
6     0.52616050    0.57114700    0.29431990    0.56032260   0.530103800
7     0.50321330    0.78129660    0.26986880    0.77825860   0.924097500
8     0.47808630    0.11267250    0.30519660    0.36128510   0.741012600
9     0.17698960    0.11461960    0.57776080    0.37801670   0.465766500
10    0.01268375    0.01370702    0.01194124    0.01227029   0.009222724

I want to change all numerical values to letter in each row using these conditions

Avg beta 0-0.2 change to AA, Avg beta 0.4-0.6 change to AB, Avg beta 0.8-1 change to BB

So I wrote something like that

apply(table, 2, function(x) ifelse (x>0 & x< 0.2, "AA",ifelse(x>0.4 & x<0.6,"AB",
+ "BB"))  )

But I get this

AA01.AVG_Beta AA02.AVG_Beta AA03.AVG_Beta AA04.AVG_Beta AA05.AVG_Beta
[1,] "AA"          NA            NA            NA            NA           
[2,] "BB"          NA            NA            NA            NA           
[3,] "BB"          NA            NA            NA            NA           
[4,] "AA"          NA            NA            NA            NA           
[5,] "BB"          NA            NA            NA            NA           
[6,] "AB"          NA            NA            NA            NA           
[7,] "AB"          NA            NA            NA            NA           
[8,] "AB"          NA            NA            NA            NA           
[9,] "AA"          NA            NA            NA            NA           
[10,] "AA"          NA            NA            NA            NA 

only the first column maybe I am missing something related with for loops?

Thanks in advance

Upvotes: 3

Views: 9785

Answers (2)

Andrie
Andrie

Reputation: 179388

Use sapply instead of apply:

Recreate your data:

dat <- read.table(text="
AA01.AVG_Beta AA02.AVG_Beta AA03.AVG_Beta AA04.AVG_Beta AA05.AVG_Beta
1     0.15851770    0.44264830    0.46662180    0.79579230   0.555430100
2     0.87148450    0.93462340    0.92591830    0.93812860   0.942683400
3     0.60907060    0.92463760    0.62698660    0.86852790   0.457659300
4     0.10728340    0.07848221    0.06340047    0.08589865   0.118239800
5     0.72353630    0.91198210    0.87339600    0.88050440   0.902925300
6     0.52616050    0.57114700    0.29431990    0.56032260   0.530103800
7     0.50321330    0.78129660    0.26986880    0.77825860   0.924097500
8     0.47808630    0.11267250    0.30519660    0.36128510   0.741012600
9     0.17698960    0.11461960    0.57776080    0.37801670   0.465766500
10    0.01268375    0.01370702    0.01194124    0.01227029   0.009222724
")

Use sapply:

sapply(dat, function(x) 
      ifelse (x>0 & x< 0.2, "AA",ifelse(x>0.4 & x<0.6,"AB", "BB"))
)

      AA01.AVG_Beta AA02.AVG_Beta AA03.AVG_Beta AA04.AVG_Beta AA05.AVG_Beta
 [1,] "AA"          "AB"          "AB"          "BB"          "AB"         
 [2,] "BB"          "BB"          "BB"          "BB"          "BB"         
 [3,] "BB"          "BB"          "BB"          "BB"          "AB"         
 [4,] "AA"          "AA"          "AA"          "AA"          "AA"         
 [5,] "BB"          "BB"          "BB"          "BB"          "BB"         
 [6,] "AB"          "AB"          "BB"          "AB"          "AB"         
 [7,] "AB"          "BB"          "BB"          "BB"          "BB"         
 [8,] "AB"          "AA"          "BB"          "BB"          "BB"         
 [9,] "AA"          "AA"          "AB"          "BB"          "AB"         
[10,] "AA"          "AA"          "AA"          "AA"          "AA"       

Upvotes: 2

James
James

Reputation: 66834

You can use cut:

x <- c(0.15,0.2,0.4,0.6,0.8,1.0)
cut(x,c(0,0.2,0.4,0.6,0.8,1.0),labels=c("AA",NA,"AB",NA,"BB"))
[1] AA   AA   <NA> AB   <NA> BB  
Levels: AA <NA> AB <NA> BB
Warning message:
In `levels<-`(`*tmp*`, value = c("AA", NA, "AB", NA, "BB")) :
  duplicated levels will not be allowed in factors anymore

Note the warning since I used NA for both your gaps in the partitions.

Upvotes: 4

Related Questions