Reputation: 13
I am very new to R, and I want to do the following:
I have a data frame that consists of ID, Col1, Col2, Col3
columns.
df <- read.table(header = TRUE, stringsAsFactors = FALSE, text="
ID Col1 Col2 Col3
1 0 'Less than once a month' 0
2 Never 0 0
3 0 0 'Once a month'
")
I want to merge those 3 columns into one, where if there is "Never"
and 0
in the other columns the value is "Never"
, if there is "Once a month"
and the rest are 0
, then "Once a month"
and so on. All columns are mutually exclusive meaning there cannot be "Never"
and "Once a month"
in the same raw.
//I tried to apply this loop:
for (val in df) {
if(df$Col1 == "Never" && df$Col2 == "0")
{
df$consolidated <- "Never"
} else (df$`Col1 == "0" && df$Col2 == "Less than once a month")
{
how_oft_purch_gr_pers$consolidated <- "Less than once a month"
}
}
I wanted to figure first for two columns only, but it didn't work, as all raws in the consolidated column are filled with "Less than once a month".
I want it to be like this:
ID Col1 Col2 Col3 Consolidated
1 0 Less than once a month 0 Less than once a month
2 Never 0 0 Never
3 0 0 Once a month Once a month
Any hint on what am I doing wrong?
Thank you in advance
Upvotes: 1
Views: 3366
Reputation: 3235
This is a possiblity using base R
Start your result column. Initialize it with only "0"
.
df$coalesced <- "0"
Loop over some columns of df
(Col1--Col3). Use drop = FALSE
in case you might only use one column, because R would output a vector in that case and for would loop over the elements of that vector and not over the single column in that case.
for( column in d[, c("Col1","Col2","Col3"), drop = FALSE]){
This checks each of coalesced
if it is already filled, and if not (if it is "0"
it fill it with the current column (which may also be "0"
)
df$coalesced <- ifelse(df$coalesced == "0", column, df$coalesced)
}
Add the new column to your data frame
df$coalesced <- coalesced
Upvotes: 0
Reputation: 3235
Even though @MKR has written a good answer, I want to point out a few errors in your code which might be the reason why it does not work
for (val in df) {
You problably want to loop over all rows of df
. However, in fact you are looping over columns of your data frame. The reason is that a data frame is a list of vectors (your columns) which all must have the same length. With your code you iterate over the elements of df
, which is the columns. See Q&A For each row in data.frame
if(df$Col1 == "Never" && df$Col2 == "0"){
Note that when using the double &&
instead of &
, R is looking only at the first element of the vector you give it. See for example Q&A Boolean Operators && and ||
df$consolidated <- "Never"
Here, you set the whole column consolidated
of df
to "Never"
, because you do not use the iteration var
from above (even if it would stand for one df
row which it does not, like you wrote it).
} else (df$`Col1 == "0" && df$Col2 == "Less than once a month"){
You need to use else if(...)
, not else (...)
. Like you wrote it, R will think the statement in (....)
should be executed if the if(...)
above is not true and the statement in {...}
after the if would be regarded by R as having nothing to do with the if... else...
construct, because it already executed (...)
. So it will execute the {...}
block always, regardless of what is the outcome of the above if(...)
.
Is df$`Col1
a typo? The backtick `
should only occur in pairs and can be used around variables (also column names)
df$consolidated <- "Less than once a month"
Here you again set a whole column to one value, like explained above.
}
}
Upvotes: 0
Reputation: 20085
You can think of using dplyr::coalesce
after replacing 0
with NA. The coalesce()
finds the first non-missing value (in a row in this case) and creates a new column. The solution can be as:
library(dplyr)
df %>% mutate_at(vars(starts_with("Col")), funs(na_if(.,"0"))) %>%
mutate(Consolidated = coalesce(Col1,Col2,Col3)) %>%
select(ID, Consolidated)
# OR in concise way once can simply write as
bind_cols(df[1], Consolidated = coalesce(!!!na_if(df[-1],"0")))
# ID Consolidated
# 1 1 Less than once a month
# 2 2 Never
# 3 3 Once a month
Data:
df <- read.table(text =
"ID Col1 Col2 Col3
1 0 'Less than once a month' 0
2 Never 0 0
3 0 0 'Once a month'",
stringsAsFactors = FALSE, header = TRUE)
Upvotes: 3