dandrews
dandrews

Reputation: 1074

Replacing non sequential data in a dataframe with sequential data (repeated for unique values)

I have a data set that looks like this:

dat <- data.frame(x=c(1,1,2,2,7,7,8,8), y=c(rep(c(-1,-2),4)), 
                  z= c(0.5,0.6,0.6,0.4,0.3,0.3,0.5,0.5))

dat
  x  y   z
1 1 -1 0.5
2 1 -2 0.6
3 2 -1 0.6
4 2 -2 0.4
5 7 -1 0.3
6 7 -2 0.3
7 8 -1 0.5
8 8 -2 0.5

The x-values represent numeric dates for which I am plotting the y and z values. I need to replace the non sequential x values with a sequential vector so that the data becomes

  x  y   z
1 1 -1 0.5
2 1 -2 0.6
3 2 -1 0.6
4 2 -2 0.4
5 3 -1 0.3
6 3 -2 0.3
7 4 -1 0.5
8 4 -2 0.5

I have tried to replace the value mathematically using a for loop that separates the data into dataframes by unique x-value. This has two issues: first the data gaps still exist any time the unique x values are used in a math formula such as data$x - min(alldata$x), and second since each resulting dataframe only has a single unique x value I cannot replace it within the loop and have the result be unique for each x value across the entire dataset.

I'm just starting with loops and I feel as though there's a different way to iterate across the data to achieve the outcome I require but I haven't been able to figure it out yet.

Upvotes: 1

Views: 98

Answers (2)

Federico Cattai
Federico Cattai

Reputation: 106

try to replace x variable like this:

as.numeric(factor(dat$x))

[1] 1 1 2 2 3 3 4 4

first you convert it to factor, and then back to numeric

Upvotes: 1

akrun
akrun

Reputation: 887048

With dplyr, this can be done with group_indices

library(dplyr)
dat %>% 
    mutate(x = group_indices(., x))

In base R an option is match

dat$x <- with(dat, match(x, unique(x)))

Upvotes: 1

Related Questions