Reputation: 116
I need to create a new dataframe using multiple conditions on an existing dataframe.
I tried using dplyr function, summarise in particular for multiple conditions but failed as the dataset size decreases once the conditions as applied.
For explanation, below is a simple sample of what I am trying to achieve.
df <- data.frame(User = c("Newton","Newton","Newton","Newton","Newton"),
Location = c("A","A","B","A","B"),
Movement = c(10,10,20,20,30),
Unit = c(-2,2,2,-2,-1),
Time = c("4-20-2019","4-20-2019","4-21-2019","4-21-2019"
,"4-23-2019"))
dfNew <- data.frame(User = c("Newton","Newton","Newton"),
FromLocation = c("A","A","B"),
ToLocation = c("A","B","B"),
Movement = c(10,20,30),
Units = c(2,2,-1))
The conditions used to calculate dfNew are as follow:
Looking at the first line of df: a) if movement is 10 and unit is negative - ignore this line
Looking at the second line of df: a) if movement is 10 and unit is positive - FromLocation and ToLocation are both A, and Units is taken from df which is 2
Looking at the third line of df: a) if movement is 20 and unit is positive - ToLocation (B) and Units (2) has to be taken from this line and FromLocation has to be taken from the next line
Looking at the fourth line of df: a) if movement is 20 and unit is negative - FromLocation(A) for the previous line of dfnew has to be taken from this line
Looking at the fifth line of df: a) if movement type is 30, then ToLocation and FromLocation will both be B and the units will be the same as df which is -1
Another pattern that could be useful is that each movement would occur on the same day/time. Also please do note that the example is for only 1 user, I have more than 2000 for which similar conditions have to be applied.
Like I said, I tried using dplyr and summarise to put all these conditions in but since the size of dataset is different I could find a way to make it work.
Appreciate any advice, thank you!
Upvotes: 1
Views: 1190
Reputation: 66480
It sounds like dplyr::group_by
and case_when
might suffice, but I'm not sure these are the right interpretations of the "rules" for your table.
library(dplyr)
df %>%
group_by(User) %>%
mutate(FromLocation = case_when(Movement == 10 & Unit < 0 ~ "DROP",
Movement == 10 & Unit > 0 ~ Location,
Movement == 20 & Unit < 0 ~ lag(Location),
Movement == 20 & Unit > 0 ~ lead(Location),
Movement == 30 ~ "B",
TRUE ~ "not specified in rules"),
ToLocation = case_when(Movement == 10 & Unit < 0 ~ "DROP",
Movement == 10 & Unit > 0 ~ Location,
Movement == 20 & Unit < 0 ~ lag(Location), # Not given
Movement == 20 & Unit > 0 ~ Location,
Movement == 30 ~ "B",
TRUE ~ "not specified in rules")) %>%
ungroup() %>%
filter(FromLocation != "DROP") %>%
select(User, FromLocation, ToLocation, Movement, Unit)
Results
# A tibble: 4 x 5
User FromLocation ToLocation Movement Unit
<chr> <chr> <chr> <dbl> <dbl>
1 Newton A A 10 2
2 Newton A B 20 2
3 Newton B B 20 -2
4 Newton B B 30 -1
Upvotes: 1