holastello
holastello

Reputation: 623

Assign unique IDs by group, according to pattern

I have a grouped dataset, but the groups don't have a unique identifier, like this:

direction <- c('N','S','W','N','N','S','W','N','S','W')
measurement <- c(4,6,1,7,2,4,7,4,1,4)
x <- data.frame(direction, measurement)
x
   direction measurement
1          N           4
2          S           6
3          W           1
4          N           7
5          N           2
6          S           4
7          W           7
8          N           4
9          S           1
10         W           4

In x, the values for direction are in order, and each repeating set of 'N', 'S', and 'W' are a group.

Each set of values for direction always will start with 'N', however, not all groups will have a complete set of 'N', 'S', and 'W' rows (see ID B below).

Another way of phrasing this: I want to give a unique ID to each set of up to 3 rows starting where direction == 'N'.

The desired result is this:

    direction measurement ID
1          N            4  A
2          S            6  A
3          W            1  A
4          N            7  B
5          N            2  C
6          S            4  C
7          W            7  C
8          N            4  D
9          S            1  D
10         W            4  D

Note that ID B only has a row for 'N'.

Upvotes: 2

Views: 186

Answers (2)

lmo
lmo

Reputation: 38500

You can get this result with == to identify elements with "N" and cumsum to construct indices. Then pull out the values from the stored vector LETTERS as suggested in balter's answer.

Here, cumsum sums over a logical vector, coercing it to numeric binary (1s, and 0s). It thus repeats the same value and increments any time an "N" was encountered.

 x$ID <- LETTERS[cumsum(x$direction == "N")]

This returns

x
   direction measurement ID
1          N           4  A
2          S           6  A
3          W           1  A
4          N           7  B
5          N           2  C
6          S           4  C
7          W           7  C
8          N           4  D
9          S           1  D
10         W           4  D

Upvotes: 3

Balter
Balter

Reputation: 1095

I think this should work:

direction <- c('N','S','W','N','N','S','W','N','S','W')
measurement <- c(4,6,1,7,2,4,7,4,1,4)
x <- data.frame(direction, measurement)

inds <- which(direction == 'N')
diffs <- diff(c(inds, length(direction)+1))

groups <- rep(seq_along(inds),diffs)

x$ID <- LETTERS[groups]
x

Note that if you have more than 26 groups, LETTERS[] will start returning NA. You can always just use:

x$ID <- groups

If that happens.

Upvotes: 2

Related Questions