Reputation: 623
I have a grouped dataset, but the groups don't have a unique identifier, like this:
direction <- c('N','S','W','N','N','S','W','N','S','W')
measurement <- c(4,6,1,7,2,4,7,4,1,4)
x <- data.frame(direction, measurement)
x
direction measurement
1 N 4
2 S 6
3 W 1
4 N 7
5 N 2
6 S 4
7 W 7
8 N 4
9 S 1
10 W 4
In x
, the values for direction
are in order, and each repeating set of 'N', 'S', and 'W' are a group.
Each set of values for direction
always will start with 'N', however, not all groups will have a complete set of 'N', 'S', and 'W' rows (see ID
B below).
Another way of phrasing this: I want to give a unique ID to each set of up to 3 rows starting where direction == 'N'
.
The desired result is this:
direction measurement ID
1 N 4 A
2 S 6 A
3 W 1 A
4 N 7 B
5 N 2 C
6 S 4 C
7 W 7 C
8 N 4 D
9 S 1 D
10 W 4 D
Note that ID
B only has a row for 'N'.
Upvotes: 2
Views: 186
Reputation: 38500
You can get this result with ==
to identify elements with "N" and cumsum
to construct indices. Then pull out the values from the stored vector LETTERS as suggested in balter's answer.
Here, cumsum
sums over a logical vector, coercing it to numeric binary (1s, and 0s). It thus repeats the same value and increments any time an "N" was encountered.
x$ID <- LETTERS[cumsum(x$direction == "N")]
This returns
x
direction measurement ID
1 N 4 A
2 S 6 A
3 W 1 A
4 N 7 B
5 N 2 C
6 S 4 C
7 W 7 C
8 N 4 D
9 S 1 D
10 W 4 D
Upvotes: 3
Reputation: 1095
I think this should work:
direction <- c('N','S','W','N','N','S','W','N','S','W')
measurement <- c(4,6,1,7,2,4,7,4,1,4)
x <- data.frame(direction, measurement)
inds <- which(direction == 'N')
diffs <- diff(c(inds, length(direction)+1))
groups <- rep(seq_along(inds),diffs)
x$ID <- LETTERS[groups]
x
Note that if you have more than 26 groups, LETTERS[]
will start returning NA. You can always just use:
x$ID <- groups
If that happens.
Upvotes: 2