bmonger
bmonger

Reputation: 77

Loop calling elements of a vector list

I have a data frame with some ambiguous observation names in it and want to add some classification to them. One of the problems I had was that some of the observation names match multiple classes that I would want to assign, so I decided to go for adding a column for each class and fill it with True/False depending on whether an observation relates to this class or not.

here is an example data frame:

col1 <- c(1:8)
col2 <- c("aa", "bb", "ab", "ba")
df <- data.frame(col1,col2)

so

   col1 col2
1     1   aa
2     2   bb
3     3   ab
4     4   ba
5     5   aa
6     6   bb
7     7   ab
8     8   ba

And the Class criteria vectors I have are:

Class1 <- "aa"                  # A Only
Class2 <- "bb"                  # B Only
Class3 <- c("ab", "ba")         # Diff symbols
Class4 <- c("ab", "ba", "aa")   # All A
Class5 <- c("ab", "ba", "bb")   # All B 

I intended to resolve my problem with a loop that would at each turn populate a new column in data frame matching Col2 value to criteria in a respective vector.

Classes <- list(Class1, Class2, Class3, Class4, Class5)
ClassName <- c("A Only", "B Only", "Diff symbols", "All A", "All B")

for (i in 1:length(ClassName)){
    df[df$col2 %in% Classes[i], 2 + i] <- "x"
}
names(df)[3:7] <- ClassName

Now this is where the problem is - only length one vectors are working properly in the loop.

  col1 col2 A Only B Only Diff symbols All A All B
1    1   aa      x   <NA>         <NA>  <NA>  <NA>
2    2   bb   <NA>      x         <NA>  <NA>  <NA>
3    3   ab   <NA>   <NA>         <NA>  <NA>  <NA>
4    4   ba   <NA>   <NA>         <NA>  <NA>  <NA>
5    5   aa      x   <NA>         <NA>  <NA>  <NA>
6    6   bb   <NA>      x         <NA>  <NA>  <NA>
7    7   ab   <NA>   <NA>         <NA>  <NA>  <NA>
8    8   ba   <NA>   <NA>         <NA>  <NA>  <NA>

Class3-Class5 produce no results for some reason, even thought if used outside loop they work fine - like:

df[df$col2 %in% Class3, 5] <- "x"

  col1 col2 A Only B Only Diff symbols All A All B
1    1   aa      x   <NA>         <NA>  <NA>  <NA>
2    2   bb   <NA>      x         <NA>  <NA>  <NA>
3    3   ab   <NA>   <NA>            x  <NA>  <NA>
4    4   ba   <NA>   <NA>            x  <NA>  <NA>
5    5   aa      x   <NA>         <NA>  <NA>  <NA>
6    6   bb   <NA>      x         <NA>  <NA>  <NA>
7    7   ab   <NA>   <NA>            x  <NA>  <NA>
8    8   ba   <NA>   <NA>            x  <NA>  <NA>

I take it something is wrong with the way I use list but I cannot find an answer.

I would really appreciate somebody sharing an insight!

Upvotes: 1

Views: 1524

Answers (2)

Richard Ambler
Richard Ambler

Reputation: 5030

Be careful with how you use the brackets [ and [[ to index lists. Use [ to return a new list with the selected index, [[ to return the object actually contained at the selected index.

For example, using your code:

> Classes[1] # returns a list
[[1]]
[1] "ab" "ba"

> Classes[[1]] # returns a vector
[1] "ab" "ba"

By using the double-brackets, i.e., changing your loop-code to:

for (i in 1:length(ClassName)) df[df$col2 %in% Classes[[i]], 2 + i] <- "x"

df changes to:

> df
  col1 col2 A Only B Only Diff symbols All A All B
1    1   aa   <NA>      x         <NA>     x  <NA>
2    2   bb   <NA>   <NA>            x  <NA>     x
3    3   ab      x      x            x  <NA>  <NA>
4    4   ba      x      x            x  <NA>  <NA>
5    5   aa   <NA>      x         <NA>     x  <NA>
6    6   bb   <NA>   <NA>            x  <NA>     x
7    7   ab      x      x            x  <NA>  <NA>
8    8   ba      x      x            x  <NA>  <NA>

Of course, there are other ways that might be more suited (e.g., easier to read) to doing what you want do do. For example:

df$contains.a <- grepl("a", df$col2)

Or if you want x or another value to mark a point:

df$contains.a <- ifelse(grepl("a", df$col2), "x", NA)

Upvotes: 2

Jthorpe
Jthorpe

Reputation: 10167

the problem is that the values in Classes is a list, and using the single bracket operator ([) returns a list object, and not the object contained in a list. It just so happens that the %in% operator does what you expect when the item contained in the list has one element (Class1 for example), but not when the item in the list is longer (e.g. Class3). Specifically, df$col2 %in% Classes[i] tests whether any of the elements of df$col2 are equal to the members of Classes[i] which cannot be true with Classes[[i]] has length greater than 1.

The solution is that in this line df[df$col2 %in% Classes[i], 2 + i] <- "x" you need to change Classes[i] to Classes[[i]].

Upvotes: 1

Related Questions