Reputation: 23
Good night! I'm brazilian and I don't speak english very well. I work a database with more than 10000 rows, represented by the example below.
df <- data.frame(
PROCESS = c(180022121, 180022121, 180022105, 180022105, 180022097, 180022097, 180022097, 180022501, 180022501),
NAME = c("A_NONIMATO", "B_NONIMATO", "C_NONIMATO", "C_NONIMATO", "D_NONIMATO", "E_NONIMATO", "F_NONIMATO", "G_NONIMATO", "G_NONIMATO"),DATE = c("02/01/2018", "02/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "02/01/2018", "02/01/2018"),
CRIME = c("ART.33", "ART.33", "ART.35", "ART.33", "ART.155", "ART.155", "ART.155", "ART.157", "ART.14CP"))
HYPOTHESIS:
a) same PROCESS
, different PARTNAME
, same CRIME
(lines 1 and 2)
b) same PROCESS
, same PARTNAME
, different CRIME
(lines 3 and 4)
c) same PROCESS
, same PARTNAME
, different CRIME
(lines 8 and 9)
My question concerns hypothesis c: I need to create a column CRIME2
to transfer ART.14CP
turning the case into just one line… without moving to hypothesis b. Throughout the database will always be written ART.14CP
.
To look like this:
df2 <- data.frame(
PROCESS = c(180022121, 180022121, 180022105, 180022105, 180022097, 180022097, 180022097, 180022501),
NAME = c("A_NONIMATO", "B_NONIMATO", "C_NONIMATO", "C_NONIMATO", "D_NONIMATO", "E_NONIMATO", "F_NONIMATO", "G_NONIMATO"),
DATE = c("02/01/2018", "02/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "01/01/2018", "02/01/2018"),
CRIME = c("ART.33", "ART.33", "ART.35", "ART.33", "ART.155", "ART.155", "ART.155", "ART.157"),
CRIME2 = c("", "", "", "", "", "", "", "ART.14CP"))
Thank you very much for your attention.
Upvotes: 0
Views: 73
Reputation: 1057
Here's a slight modification of Ronak's answer using case_when and another way of approaching the logic. I prefer case_when to if or ifelse since it's more intuitive to others reading your code, especially if you need to use multiple conditions or have numerous cases. The TRUE condition will capture any remaining cases; if you don't put it, those cases be set to NA by default.
library(dplyr)
df %>%
group_by(PROCESS, NAME) %>%
mutate(CRIME2 = case_when(
"ART.14CP" %in% CRIME ~ "ART.14CP",
TRUE ~ ""
)) %>%
filter(CRIME != "ART.14CP")
Upvotes: 1
Reputation: 388797
If you want to only have values in CRIME2
where CRIME == "ART.14CP"
one way using dplyr
for each group (PROCESS
and NAME
) assign value "ART.14CP"
only if it is present or else assign an empty value. We remove the duplicate row entry for "ART.14CP"
using filter
.
library(dplyr)
df %>%
group_by(PROCESS, NAME) %>%
mutate(CRIME2 = if (any(CRIME == "ART.14CP")) "ART.14CP" else "") %>%
filter(CRIME != "ART.14CP")
# PROCESS NAME DATE CRIME CRIME1
# <dbl> <fct> <fct> <fct> <chr>
#1 180022121 A_NONIMATO 02/01/2018 ART.33 ""
#2 180022121 B_NONIMATO 02/01/2018 ART.33 ""
#3 180022105 C_NONIMATO 01/01/2018 ART.35 ""
#4 180022105 C_NONIMATO 01/01/2018 ART.33 ""
#5 180022097 D_NONIMATO 01/01/2018 ART.155 ""
#6 180022097 E_NONIMATO 01/01/2018 ART.155 ""
#7 180022097 F_NONIMATO 01/01/2018 ART.155 ""
#8 180022501 G_NONIMATO 02/01/2018 ART.157 ART.14CP
Upvotes: 2