Praneeth Munuganti
Praneeth Munuganti

Reputation: 116

R: Creating new dataframe based on multiple conditions on existing dataframe

I need to create a new dataframe using multiple conditions on an existing dataframe.

I tried using dplyr function, summarise in particular for multiple conditions but failed as the dataset size decreases once the conditions as applied.

For explanation, below is a simple sample of what I am trying to achieve.

df <- data.frame(User = c("Newton","Newton","Newton","Newton","Newton"),
                 Location = c("A","A","B","A","B"),
                 Movement = c(10,10,20,20,30),
                 Unit = c(-2,2,2,-2,-1),
                 Time = c("4-20-2019","4-20-2019","4-21-2019","4-21-2019"
                          ,"4-23-2019"))

dfNew <- data.frame(User = c("Newton","Newton","Newton"),
                    FromLocation = c("A","A","B"),
                    ToLocation = c("A","B","B"),
                    Movement = c(10,20,30),
                    Units = c(2,2,-1))

The conditions used to calculate dfNew are as follow:

  1. Looking at the first line of df: a) if movement is 10 and unit is negative - ignore this line

  2. Looking at the second line of df: a) if movement is 10 and unit is positive - FromLocation and ToLocation are both A, and Units is taken from df which is 2

  3. Looking at the third line of df: a) if movement is 20 and unit is positive - ToLocation (B) and Units (2) has to be taken from this line and FromLocation has to be taken from the next line

  4. Looking at the fourth line of df: a) if movement is 20 and unit is negative - FromLocation(A) for the previous line of dfnew has to be taken from this line

  5. Looking at the fifth line of df: a) if movement type is 30, then ToLocation and FromLocation will both be B and the units will be the same as df which is -1

Another pattern that could be useful is that each movement would occur on the same day/time. Also please do note that the example is for only 1 user, I have more than 2000 for which similar conditions have to be applied.

Like I said, I tried using dplyr and summarise to put all these conditions in but since the size of dataset is different I could find a way to make it work.

Appreciate any advice, thank you!

Upvotes: 1

Views: 1190

Answers (1)

Jon Spring
Jon Spring

Reputation: 66480

It sounds like dplyr::group_by and case_when might suffice, but I'm not sure these are the right interpretations of the "rules" for your table.

library(dplyr)
df %>%
  group_by(User) %>%
  mutate(FromLocation = case_when(Movement == 10 & Unit < 0 ~  "DROP",
                                  Movement == 10 & Unit > 0 ~  Location,
                                  Movement == 20 & Unit < 0 ~  lag(Location),
                                  Movement == 20 & Unit > 0 ~  lead(Location),
                                  Movement == 30  ~ "B",
                                  TRUE   ~ "not specified in rules"),

         ToLocation = case_when(Movement == 10 & Unit < 0 ~  "DROP",
                                Movement == 10 & Unit > 0 ~  Location,
                                Movement == 20 & Unit < 0 ~  lag(Location), # Not given 
                                Movement == 20 & Unit > 0 ~  Location,
                                Movement == 30  ~ "B",
                                TRUE   ~ "not specified in rules")) %>%
  ungroup() %>%
  filter(FromLocation != "DROP") %>%
  select(User, FromLocation, ToLocation, Movement, Unit)

Results

# A tibble: 4 x 5
  User   FromLocation ToLocation Movement  Unit
  <chr>  <chr>        <chr>         <dbl> <dbl>
1 Newton A            A                10     2
2 Newton A            B                20     2
3 Newton B            B                20    -2
4 Newton B            B                30    -1

Upvotes: 1

Related Questions