Reputation: 316
I'm new to R and this forum so apologies for a rather fundamental question.
I have a bunch of columns (i.e. variables, because it's a data frame) where the colnames all start out with the same name but end on a distinct number, say: variable_0, variable_1, and so on up to 12.
For each of those columns each row contains numbers, again from zero to twelve.
I'm interested in finding (for each row) the value at which the trailing number of the colname matches the value of that particular variable:
v_0 v_1 v_2 v_3
1 2 2 2
1 2 3 3
In this example, what I would want to have is a new variable x that equals 2 for row 1 (because v_i=i only for i=2) and 3 for row2.
Ideally, the code would also include a solution for the case where there exist more than one matches per row: create a variable y that is one if # of matches exceed one and zero otherwise; set x to the first match.
Help is greatly appreciated! Thank you!
Upvotes: 2
Views: 1301
Reputation: 31171
Try this:
trail = as.numeric(gsub(".*_([0-9]*)","\\1",names(df)))
df$x = apply(df, 1, function(u) if(all(trail!=u)) NA else trail[match(T,trail==u)])
#> df
# v_0 v_1 v_2 v_3 x
#1 1 2 2 2 2
#2 1 2 3 3 3
This way if you have - in case - crappy data like:
df = data.frame(v_0=c(1,1,2), v_1=c(1,2,5), v_2=2:4, v_3=2:4)
# v_0 v_1 v_2 v_3
#1 1 1 2 2
#2 1 2 3 3
#3 2 5 4 4
df$x = apply(df, 1, function(u) if(all(trail!=u)) NA else trail[match(T,trail==u)])
# v_0 v_1 v_2 v_3 x
#1 1 1 2 2 1
#2 1 2 3 3 3
#3 2 5 4 4 NA
trail
contains the trailing number of each column (I supposed the key delimiter is _
). Then for each row we check which number equal its trail number in the column (we use apply
to loop, second argument 1
indicates we loop over the rows - 2
is for the columns). If there is no match, we return NA
. If there is one or more, we take the first number.
Upvotes: 3
Reputation: 887108
Another option using @ColonelBeauvel's data is
trail <- as.numeric(sub('[^0-9]+', '', names(df)))
indx <- df==trail[col(df)]
df$x <- trail[max.col(indx, 'first')* NA^!rowSums(indx)]
df
# v_0 v_1 v_2 v_3 x
#1 1 1 2 2 1
#2 1 2 3 3 3
#3 2 5 4 4 NA
Upvotes: 0