Jia Gao
Jia Gao

Reputation: 1292

Create new column by adding conditions on rows in data table in R

Like the title, it's complicated to describe , so I'll just show the code , what I got and what I want it to be.

set.seed(1)
df<-data.frame('X1'=rnorm(10),
               'X2'=rnorm(10),
               'X3'=c(c(rep('A',5)),c(rep('B',5))))

## create a bew column 'SPX2' which is the smallest positive number OF X2 
## of each group(A and B)

require(data.table)
setDT(df)[X2>0,SPX2:=min(X2),by=X3]
df

then I got the result as:

            X1          X2 X3      SPX2
 1: -0.6264538  1.51178117  A 0.3898432
 2:  0.1836433  0.38984324  A 0.3898432
 3: -0.8356286 -0.62124058  A        NA
 4:  1.5952808 -2.21469989  A        NA
 5:  0.3295078  1.12493092  A 0.3898432
 6: -0.8204684 -0.04493361  B        NA
 7:  0.4874291 -0.01619026  B        NA
 8:  0.7383247  0.94383621  B 0.5939013
 9:  0.5757814  0.82122120  B 0.5939013
10: -0.3053884  0.59390132  B 0.5939013

and what I want is :

            X1          X2 X3      SPX2
 1: -0.6264538  1.51178117  A 0.3898432
 2:  0.1836433  0.38984324  A 0.3898432
 3: -0.8356286 -0.62124058  A 0.3898432
 4:  1.5952808 -2.21469989  A 0.3898432
 5:  0.3295078  1.12493092  A 0.3898432
 6: -0.8204684 -0.04493361  B 0.5939013
 7:  0.4874291 -0.01619026  B 0.5939013
 8:  0.7383247  0.94383621  B 0.5939013
 9:  0.5757814  0.82122120  B 0.5939013
10: -0.3053884  0.59390132  B 0.5939013

cause I want to create a new column df$X4<-df$SPX2 - df$X2,o any other operations that require SPX2 to be like above. I did my search and found several posts like the one here , but that's not what I try do here.

Anyone know how to achieve this?

Upvotes: 4

Views: 1886

Answers (2)

jav
jav

Reputation: 1495

Using the data.table package:

setDT(df)    
df[,SPX2:=min(X2[X2 > 0]),by=X3]

What this does is for each value of X3, subset on positive values of X2 (i.e. X2[X2 > 0]) and then take the minimum over all positive values. Note that if there are no positive values (i.e. X2[X2 > 0] is empty), then the resulting value would be Inf. Keep this in mind especially if you want to do any further calculations using SPX2.

As per your question why X2[X2 > 0] works, think about it as follows: For each value of X3, a vector of corresponding values of X2 is returned. Now, you can just perform regular vector operations on this vector, one of which is subsetting via X2 > 0. It works much like the following:

x2 = c(-1, 1, 2, 3, -2, 4)
x2[x2 > 0]
# [1] 1 2 3 4

Hope this helps!

Upvotes: 1

Aramis7d
Aramis7d

Reputation: 2496

tidyverse alternative:

df %>%
  group_by(X3) %>%
  mutate(SPX2 = min(X2[X2>0]))

which gives:

           X1          X2     X3      SPX2
        <dbl>       <dbl> <fctr>     <dbl>
 1 -0.6264538  1.51178117      A 0.3898432
 2  0.1836433  0.38984324      A 0.3898432
 3 -0.8356286 -0.62124058      A 0.3898432
 4  1.5952808 -2.21469989      A 0.3898432
 5  0.3295078  1.12493092      A 0.3898432
 6 -0.8204684 -0.04493361      B 0.5939013
 7  0.4874291 -0.01619026      B 0.5939013
 8  0.7383247  0.94383621      B 0.5939013
 9  0.5757814  0.82122120      B 0.5939013
10 -0.3053884  0.59390132      B 0.5939013

Upvotes: 2

Related Questions