Reputation: 193
Since some days I try to find a way to subset my data frame by comparing a character in a column with a string in another column.
In case the character is not within the string, I want to copy a value to a new column. I searched high and low, tried many examples, but for some reason I do not get it to work on my data frame.
df <- structure(list(POLY = c("K3", "K3", "K3", "K4", "K4", "K4", "K4",
"K6", "K6", "K7", "K7", "K7", "L1", "L1", "L1"), FIX = c("O",
"K", "M", "M", "K", "O", "L", "K", "M", "K", "O", "M", "M", "L",
"O"), SESSTIME = c(310, 190, 181, 188, 151, 260, 268, 200, 259,
245, 180, 188, 259, 199, 244), CODE = c("KO", "KO", "KO", "KM",
"KM", "KM", "KM", "KM", "KM", "KO", "KO", "KO", "LMO", "LMO",
"LMO")), .Names = c("POLY", "FIX", "SESSTIME", "CODE"), row.names = c(42L,
44L, 46L, 115L, 116L, 117L, 133L, 225L, 231L, 269L, 270L, 328L,
420L, 425L, 431L), class = "data.frame")
This it what a part of it looks like:
row.names POLY FIX SESSTIME CODE SESSTIME2
1 42 K3 O 310 KO NA
2 44 K3 K 190 KO NA
3 46 K3 M 181 KO ...
4 115 K4 M 188 KM
5 116 K4 K 151 KM
6 117 K4 O 260 KM NA
7 133 K4 L 268 KM 268
8 225 K6 K 200 KM NA
9 231 K6 M 259 KM
10 269 K7 K 245 KO
11 270 K7 O 180 KO
12 328 K7 M 188 KO 188
13 420 L1 M 259 LMO
14 425 L1 L 199 LMO
15 431 L1 O 244 LMO
So when FIX is not in CODE the value of SESSTIME should be copied to SESSTIME2 (column already prepopulated with NA)
I tried it for example with
df$FIX %in% strsplit(as.character(df$CODE,""))
or similar, but the comparison is always TRUE.
All examples I found only applied (and worked) with comparison of a single character e.g. "K" hardcoded with a vector c("K","L","M") or so, but never an example how to apply this to data frame columns and rows.
I'm getting a little bit nervous ...
Anyone an idea what I'm doing wrong?
UPDATE:
Thanx to the answer below, my code now looks like this and does what I need:
df3$SESSTIME2[!(mapply(function(i, j) length(grep(i, j)), df$FIX, df$CODE)) & is.na(df$SESSTIME2)]
<-
df$SESSTIME[!(mapply(function(i, j) length(grep(i, j)), df$FIX, df$CODE)) & is.na(df$SESSTIME2)]
Upvotes: 0
Views: 2497
Reputation: 60452
The reason your code doesn't work is because
strsplit(as.character(df$CODE,""))
returns a list. Instead, you need to use mapply
to detect if there is a match.
Here we used grep
which allows more flexible character matching
# The values of FIX & CODE are passed to i and j
mapply(function(i, j) length(grep(i, j)), df$FIX, df$CODE)
or using %in%
## Suggested by akrun
mapply('%in%', df$FIX,strsplit(as.character(df$CODE), ''))
Upvotes: 2