Jay
Jay

Reputation: 915

rle function in R for groups

Below is how my data looks like.

City, count
Mexico, 1
Mexico, 1
London, 0
London, 1
London, 1

I am using Rle function to count the consistently in my values, but unable to apply the group logic.

I tried loop function, but it didn’t work.

I am looking for output like below

Mexico, 1:2
London, 0:1
London, 1:2

Upvotes: 3

Views: 1891

Answers (5)

alistaire
alistaire

Reputation: 43344

data.table::rleid is a quick way to add a run ID variable to group by, after which aggregation is typical. You can borrow it for a dplyr context, if you like:

library(dplyr)

df <- data_frame(City = c("Mexico", "Mexico", "London", "London", "London"), 
                 count = c(1L, 1L, 0L, 1L, 1L))

df %>% 
    group_by(run = data.table::rleid(City, count), City) %>% 
    summarise(count = paste(count[1], n(), sep = ':'))
#> # A tibble: 3 x 3
#> # Groups:   run [?]
#>     run City   count
#>   <int> <chr>  <chr>
#> 1     1 Mexico 1:2  
#> 2     2 London 0:1  
#> 3     3 London 1:2

But this data isn't big enough to differentiate between ordinary and run grouping. Resampling it to make it more representative dataset,

set.seed(47)    # for reproducibility
df2 <- df %>% slice(sample(nrow(.), 10, replace = TRUE))

df2 %>% 
    group_by(run = data.table::rleid(City, count), City) %>% 
    summarise(count = paste(count[1], n(), sep = ':'))
#> # A tibble: 8 x 3
#> # Groups:   run [?]
#>     run City   count
#>   <int> <chr>  <chr>
#> 1     1 London 1:1  
#> 2     2 Mexico 1:1  
#> 3     3 London 1:2  
#> 4     4 London 0:1  
#> 5     5 London 1:1  
#> 6     6 Mexico 1:1  
#> 7     7 London 0:2  
#> 8     8 London 1:1

If you prefer, the same logic all in data.table:

library(data.table)

setDT(df2)[, 
           .(count = paste(count[1], .N, sep = ':')), 
           by = .(run = rleid(City, count), City)]
#>    run   City count
#> 1:   1 London   1:1
#> 2:   2 Mexico   1:1
#> 3:   3 London   1:2
#> 4:   4 London   0:1
#> 5:   5 London   1:1
#> 6:   6 Mexico   1:1
#> 7:   7 London   0:2
#> 8:   8 London   1:1

or base R:

df2$run <- data.table::rleid(df2$City, df2$count)

aggregate(count ~ City + run, df2, function(x) paste(x[1], length(x), sep = ':'))
#>     City run count
#> 1 London   1   1:1
#> 2 Mexico   2   1:1
#> 3 London   3   1:2
#> 4 London   4   0:1
#> 5 London   5   1:1
#> 6 Mexico   6   1:1
#> 7 London   7   0:2
#> 8 London   8   1:1

Upvotes: 5

Terru_theTerror
Terru_theTerror

Reputation: 5017

Try this solution:

Define a function using rle:

f<-function(x,df)
 {
   var<-rle(df[df$City==x,"count"])
   out<-data.frame(x,cbind(paste(var$values,var$lengths,sep=":")))
   return(out)
 }

Apply it on every city value using lapply

library("dplyr")
df_out<-suppressWarnings(bind_rows(lapply(as.character(unique(df$City)),f,df=df)))
colnames(df_out)<-c("City","count")

Your desired output:

df_out
    City count
1 Mexico   1:2
2 London   0:1
3 London   1:2

Upvotes: 1

erocoar
erocoar

Reputation: 5893

df <- data.frame(city = c("Mexico", "Mexico", "London", "London", "London"),
                 count = c(1, 1, 0, 1, 1))

r <- rle(df$count)
df <- df[!duplicated(df), ] 
df$count <- paste0(r$values, ":", r$lengths)

    city count
1 Mexico   1:2
3 London   0:1
4 London   1:2

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 269654

Try aggregate: We define n as a synonym for count so that it does not get confused with it on both sides:

aggregate(count ~ City + n, transform(DF, n = count), 
   function(x) paste0(x[1], ":", length(x)))

giving:

    City n count
1 London 0   0:1
2 London 1   1:2
3 Mexico 1   1:2

Note

Lines <- "City, count
Mexico, 1
Mexico, 1
London, 0
London, 1
London, 1"
DF <- read.csv(text = Lines, as.is = TRUE)

Upvotes: 2

Dan
Dan

Reputation: 12074

Here's a solution that uses dplyr. I use count to count the instances of each city-number combination. Then, I stick together the number and the count of that number with a colon separator.

library(dplyr)

df <- data.frame(city = c("Mexico", "Mexico", "London", "London", "London"),
                 nums = c(1, 1, 0, 1, 1))

df %>% 
  count(city, nums) %>% 
  mutate(stuff = paste(nums, n, sep = ":")) %>% 
  select(city, stuff)

# # A tibble: 3 x 2
#   city   stuff
#   <fct>  <chr>
# 1 London 0:1  
# 2 London 1:2  
# 3 Mexico 1:2  

Upvotes: 2

Related Questions