Joe Dickens
Joe Dickens

Reputation: 13

Merging multiple columns in a dataframe based on condition in R

I am very new to R, and I want to do the following:

I have a data frame that consists of ID, Col1, Col2, Col3 columns.

df <- read.table(header = TRUE, stringsAsFactors = FALSE, text="
ID Col1    Col2                        Col3             
1  0       'Less than once a month'    0               
2  Never   0                           0              
3  0       0                           'Once a month'
")

I want to merge those 3 columns into one, where if there is "Never" and 0 in the other columns the value is "Never", if there is "Once a month" and the rest are 0, then "Once a month" and so on. All columns are mutually exclusive meaning there cannot be "Never" and "Once a month" in the same raw.

 //I tried to apply this loop:

         for (val in df) {
if(df$Col1 == "Never" && df$Col2 == "0")
  {
  df$consolidated <- "Never"
  } else (df$`Col1 == "0" && df$Col2 == "Less than once a month")
  {
  how_oft_purch_gr_pers$consolidated <- "Less than once a month"
  }
}

I wanted to figure first for two columns only, but it didn't work, as all raws in the consolidated column are filled with "Less than once a month".

I want it to be like this:

ID Col1    Col2                       Col3             Consolidated
1  0       Less than once a month       0              Less than once a month
2  Never   0                            0              Never
3  0       0                            Once a month   Once a month

Any hint on what am I doing wrong?

Thank you in advance

Upvotes: 1

Views: 3366

Answers (3)

akraf
akraf

Reputation: 3235

This is a possiblity using base R

Start your result column. Initialize it with only "0".

df$coalesced <- "0"

Loop over some columns of df (Col1--Col3). Use drop = FALSE in case you might only use one column, because R would output a vector in that case and for would loop over the elements of that vector and not over the single column in that case.

for( column in d[, c("Col1","Col2","Col3"), drop = FALSE]){

This checks each of coalesced if it is already filled, and if not (if it is "0" it fill it with the current column (which may also be "0")

    df$coalesced <- ifelse(df$coalesced == "0", column, df$coalesced)

}

Add the new column to your data frame

df$coalesced <- coalesced

Upvotes: 0

akraf
akraf

Reputation: 3235

Even though @MKR has written a good answer, I want to point out a few errors in your code which might be the reason why it does not work

for (val in df) {

You problably want to loop over all rows of df. However, in fact you are looping over columns of your data frame. The reason is that a data frame is a list of vectors (your columns) which all must have the same length. With your code you iterate over the elements of df, which is the columns. See Q&A For each row in data.frame

  if(df$Col1 == "Never" && df$Col2 == "0"){

Note that when using the double && instead of &, R is looking only at the first element of the vector you give it. See for example Q&A Boolean Operators && and ||

    df$consolidated <- "Never"

Here, you set the whole column consolidated of df to "Never", because you do not use the iteration var from above (even if it would stand for one df row which it does not, like you wrote it).

  } else (df$`Col1 == "0" && df$Col2 == "Less than once a month"){
  • You need to use else if(...), not else (...). Like you wrote it, R will think the statement in (....) should be executed if the if(...) above is not true and the statement in {...} after the if would be regarded by R as having nothing to do with the if... else... construct, because it already executed (...). So it will execute the {...} block always, regardless of what is the outcome of the above if(...).

  • Is df$`Col1 a typo? The backtick ` should only occur in pairs and can be used around variables (also column names)

    df$consolidated <- "Less than once a month"

Here you again set a whole column to one value, like explained above.

  } 
}

Upvotes: 0

MKR
MKR

Reputation: 20085

You can think of using dplyr::coalesce after replacing 0 with NA. The coalesce() finds the first non-missing value (in a row in this case) and creates a new column. The solution can be as:

library(dplyr)

df %>% mutate_at(vars(starts_with("Col")), funs(na_if(.,"0"))) %>%
  mutate(Consolidated = coalesce(Col1,Col2,Col3)) %>%
  select(ID, Consolidated)

# OR in concise way once can simply write as
bind_cols(df[1], Consolidated = coalesce(!!!na_if(df[-1],"0")))

#   ID           Consolidated
# 1  1 Less than once a month
# 2  2                  Never
# 3  3           Once a month

Data:

df <- read.table(text = 
"ID Col1    Col2                       Col3             
1  0       'Less than once a month'       0               
2  Never   0                            0              
3  0       0                            'Once a month'",
stringsAsFactors = FALSE, header = TRUE)

Upvotes: 3

Related Questions