add columns based on repetions and conditions of other columns R

Question

I'm a bit lost about the best way to perform what I want in R.

I have a dataframe with several columns. One of them is user and other column, let's say number (0,1,2,3,4,5). Sometimes the user is repeated but the column number should have a sequence of numbers in this case. Can be something like this:

user number  column B column C
1    0        85      200
2    1        165     852
1    2        200     1000
3    0        15      689
3    1        89      896
4    0        78      582
4    2        96      586

What I would like to obtain is:

user number status days
1    0      0       200
3    0      1       50 ->(value column C, user 3, number 1)-(value column B, user 3, number 0)
4    0      0       200

I want to add two columns to the dataframe based on values from these users.

If user has a number == 0 but user is unique status == 0 and days == some predefined number
If user has a number == 0 and the user is not unique
- If second appearance of user has a number == 1, status == 1 and days should be a subtraction from a value column C second appearance of user - value column B first appearance of user
- If second appearance of user has a number != 1, status == 0 and days == some predefined number
Remove from dataset users that don't have number == 0

I'm asking this question because I'm a bit confused about the best way to do this. I though about subset the dataframe where number == 0 and then compare users between dataframes to see repeated users but perhaps is it not the best way to do this.

Any pointers and ideas would be great!

Thank you for your help

EDIT

columnB is the number of days that have passed since '2002-01-01'. For instance the first user enter in the dataframe 85 days after 2002-01-01

columnC is the number of days that the user stayed in observation. After those days the user is no longer in the database.

EDIT2

My dataset has 36 columns and 26075rows

After some suggestions where's what I have so far.

#remove from dataset all users that don't have number == 0
df1 <- df[df$user %in% df[df$number == 0,1],]

df1["Status"] <- 0

#doesn't work
df1[df1$number == 1 %in% df1[duplicated(df1$user),]]

What I was trying to do is to find all duplicated users and if the number is equal to 1. To try something like this afterwards:

 df1[df1$number == 1 %in% df1[duplicated(df1$user),]] <- df1$Status == 1

psoares · Accepted Answer

 #order the dataset by columns needed
 df2 <- df2[order(df2$user, df2$number),]

 ids <- unique(df2$user)

 #create variable status and days
 status <- NULL
 end <- NULL
 for (i in 1:length(ids)) {
     aux <- df2[df2$user %in% ids[i],]
     if (nrow(aux) == 1 & aux$number[1] == 0){
          status[i] <- 0
          days[i] <- end
     } else if (nrow(aux) > 1) {
          status[i] <- 1
          days[i] <- aux$Event[2]
     } else {
          status[i] <- NA
          days[i] <- NA
     }
 }

 #merge the vector created with the database
 data <- data.frame(ids, status, days)
 df2 <- df2[!duplicated(df2$user),]
 df2 <- merge(df2, data, by.x="user", by.y="ids")

 #remove NAs from the dataset
 df.f <- df2[!(is.na(df2$days)),]

add columns based on repetions and conditions of other columns R

Answers (2)

Related Questions