Reputation: 1207
I have a column in a data frame df$moves
which looks like this:
W1.e4 B1.d5 W2.c4 B2.e6 W3.Nc3 B3.Nf6 W4.cxd5 B4.exd5 W5.Bg5
W1.e4 B1.d5 W2.exd5 B2.Qxd5 W3.Nc3 B3.Qa5 W4.d4 B4.Nf6 W5.Nf3 B5.c6 W6.Ne5 B6.Bf5
W1.e4 B1.e5 W2.Nf3 B2.Nc6 W3.Bc4
W1.e4 B1.e5 W2.Nf3 B2.Nf6
W1.e4 B1.c5 W2.Nf3
I want to get a count of all unique values before the character "W2." appears. In the above, for example, I'd expect the count of unique values before "W2." to be 1
, being the last row only, as up until "W2." row 1 is the same as row 2 and row 3 is the same as row 4.
How should this be done?
Upvotes: 0
Views: 65
Reputation: 83235
A possible approach is to extract the parts before W2
:
# option 1:
vec <- substr(df$moves, 1, regexpr('W2\\.', df$moves) - 1)
# option 2:
vec <- sub('W2.*', '', df$moves)
and then see whether they are unique:
sum(!duplicated(vec) & !duplicated(vec, fromLast = TRUE))
which gives:
> sum(!duplicated(vec) & !duplicated(vec, fromLast = TRUE)) [1] 1
What this does:
regexpr('W2\\.', df$moves)
extracts the positions where W2
first appears.1
from those positions and feed the result to substr
: substr(df$moves, 1, regexpr('W2\\.', df$moves) - 1)
then gets the parts before W2
.sub
instead of a substr
/regexpr
-combo: sub('W2.*', '', df$moves)
.!duplicated(vec) & !duplicated(vec, fromLast = TRUE)
indicates which parts of vec
are unique.sum
you get the number of unique values before W2
.If you want to count the number of unique values instead of the values that only appear once, you can either do sum(!duplicated(vec))
of length(unique(vec))
Used data:
df <- structure(list(moves = c("W1.e4 B1.d5 W2.c4 B2.e6 W3.Nc3 B3.Nf6 W4.cxd5 B4.exd5 W5.Bg5",
"W1.e4 B1.d5 W2.exd5 B2.Qxd5 W3.Nc3 B3.Qa5 W4.d4 B4.Nf6 W5.Nf3 B5.c6 W6.Ne5 B6.Bf5",
"W1.e4 B1.e5 W2.Nf3 B2.Nc6 W3.Bc4", "W1.e4 B1.e5 W2.Nf3 B2.Nf6", "W1.e4 B1.c5 W2.Nf3")),
.Names = "moves", class = "data.frame", row.names = c(NA, -5L))
Upvotes: 3
Reputation: 20095
An option using strsplit
with look-ahead split
argument as split = " (?=W2\\.)"
can be as:
length(unique(sapply(strsplit(df$Moves, split = " (?=W2\\.)", perl = TRUE),
function(x)x[1])))
#[1] 3
# where the unique values are:
unique(sapply(strsplit(df$Moves, split = " (?=W2\\.)", perl = TRUE),
function(x)x[1]))
#[1] "W1.e4 B1.d5" "W1.e4 B1.e5" "W1.e4 B1.c5"
Regex:
" (?=W2\\.)" -- space followed by W2.
Data:
df <- read.table(text =
"Moves
'W1.e4 B1.d5 W2.c4 B2.e6 W3.Nc3 B3.Nf6 W4.cxd5 B4.exd5 W5.Bg5'
'W1.e4 B1.d5 W2.exd5 B2.Qxd5 W3.Nc3 B3.Qa5 W4.d4 B4.Nf6 W5.Nf3 B5.c6 W6.Ne5 B6.Bf5'
'W1.e4 B1.e5 W2.Nf3 B2.Nc6 W3.Bc4'
'W1.e4 B1.e5 W2.Nf3 B2.Nf6'
'W1.e4 B1.c5 W2.Nf3'",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 0