Reputation: 97
I would like to change an NA value on the third observation of companyID 1 in column 2 (value) to the character "F". My dataset is huge, so will not be able to say which exact row this observation is on but I will know that it is on the third observation of the company ID for example. I hope this makes sense. Then, I would like fourth observation of companyID 2 in column 2 (value) to be "X". Then, I would like the second observation of companyID 3 in column 2 (value) to be "B", and so on. Each value is different and will lie on a different row of a companyID.
I found the following question which is somewhat similar, but it is unable to give me what I want How to replace certain values in a specific rows and columns with NA in R? .
Here is an example of what I have:
companyID value
1 NA
1 NA
1 NA
1 NA
1 NA
2 NA
2 NA
2 NA
2 NA
2 NA
3 NA
3 NA
3 NA
3 NA
3 NA
And I would like to get the following output:
companyID value
1 NA
1 NA
1 F
1 NA
1 NA
2 NA
2 NA
2 NA
2 X
2 NA
3 NA
3 B
3 NA
3 NA
3 NA
Super appreciative of your help!
Upvotes: 0
Views: 486
Reputation: 1101
A solution using data.table
library(data.table)
df <- read.table(header = TRUE,text="companyID value
1 NA
1 NA
1 NA
1 NA
1 NA
2 NA
2 NA
2 NA
2 NA
2 NA
3 NA
3 NA
3 NA
3 NA
3 NA")
setDT(df)
df[, value := as.character(value)] # Converting column to character
df[, dummy := 1:.N, companyID] # A dummy column with indices per companyID
mapply(function(x, y, z) { # "multi"-applying replacement function
df[companyID == x & dummy == y, value := z]
NULL},
c(1, 2, 3), # x are the companyIDs
c(3, 4, 2), # y are dummy indices to be replaced
c("F", "X", "B") # z are the replacements
)
df$dummy <- NULL #Bye dummy variable
df
companyID value
1: 1 <NA>
2: 1 <NA>
3: 1 F
4: 1 <NA>
5: 1 <NA>
6: 2 <NA>
7: 2 <NA>
8: 2 <NA>
9: 2 X
10: 2 <NA>
11: 3 <NA>
12: 3 B
13: 3 <NA>
14: 3 <NA>
15: 3 <NA>
Upvotes: 0
Reputation:
Try something like this. Add in a rowID
and then create a lookup data frame with your replacements. Then you can just left_join()
in the new values.
The lookup_df
replaces your "if this company and this row then new value" logic.
library(dplyr)
df <- tibble(companyID = c(rep(1, 5), rep(2, 6)), value = NA_character_)
lookup_df <- tibble(companyID = c(1, 2), rowID = c(3, 4), valueNew = c("F", "D"))
df %>%
group_by(companyID) %>%
mutate(rowID = row_number()) %>%
left_join(lookup_df, by = c("companyID", "rowID")) %>%
mutate(value = coalesce(value, valueNew)) %>%
select(companyID, value)
result:
# A tibble: 11 x 2
# Groups: companyID [2]
companyID value
<dbl> <chr>
1 1 NA
2 1 NA
3 1 F
4 1 NA
5 1 NA
6 2 NA
7 2 NA
8 2 NA
9 2 D
10 2 NA
11 2 NA
Upvotes: 2
Reputation: 11548
Are you looking at a solution like this:
library(dplyr)
df %>% group_by(companyID) %>%
mutate(value = case_when(row_number() == 3 ~ 'F', TRUE ~ value))
# A tibble: 10 x 2
# Groups: companyID [2]
companyID value
<dbl> <chr>
1 1 NA
2 1 NA
3 1 F
4 1 NA
5 1 NA
6 2 NA
7 2 NA
8 2 F
9 2 NA
10 2 NA
Upvotes: 0