Reputation: 915
Below is how my data looks like.
City, count
Mexico, 1
Mexico, 1
London, 0
London, 1
London, 1
I am using Rle function to count the consistently in my values, but unable to apply the group logic.
I tried loop function, but it didn’t work.
I am looking for output like below
Mexico, 1:2
London, 0:1
London, 1:2
Upvotes: 3
Views: 1891
Reputation: 43344
data.table::rleid
is a quick way to add a run ID variable to group by, after which aggregation is typical. You can borrow it for a dplyr context, if you like:
library(dplyr)
df <- data_frame(City = c("Mexico", "Mexico", "London", "London", "London"),
count = c(1L, 1L, 0L, 1L, 1L))
df %>%
group_by(run = data.table::rleid(City, count), City) %>%
summarise(count = paste(count[1], n(), sep = ':'))
#> # A tibble: 3 x 3
#> # Groups: run [?]
#> run City count
#> <int> <chr> <chr>
#> 1 1 Mexico 1:2
#> 2 2 London 0:1
#> 3 3 London 1:2
But this data isn't big enough to differentiate between ordinary and run grouping. Resampling it to make it more representative dataset,
set.seed(47) # for reproducibility
df2 <- df %>% slice(sample(nrow(.), 10, replace = TRUE))
df2 %>%
group_by(run = data.table::rleid(City, count), City) %>%
summarise(count = paste(count[1], n(), sep = ':'))
#> # A tibble: 8 x 3
#> # Groups: run [?]
#> run City count
#> <int> <chr> <chr>
#> 1 1 London 1:1
#> 2 2 Mexico 1:1
#> 3 3 London 1:2
#> 4 4 London 0:1
#> 5 5 London 1:1
#> 6 6 Mexico 1:1
#> 7 7 London 0:2
#> 8 8 London 1:1
If you prefer, the same logic all in data.table:
library(data.table)
setDT(df2)[,
.(count = paste(count[1], .N, sep = ':')),
by = .(run = rleid(City, count), City)]
#> run City count
#> 1: 1 London 1:1
#> 2: 2 Mexico 1:1
#> 3: 3 London 1:2
#> 4: 4 London 0:1
#> 5: 5 London 1:1
#> 6: 6 Mexico 1:1
#> 7: 7 London 0:2
#> 8: 8 London 1:1
or base R:
df2$run <- data.table::rleid(df2$City, df2$count)
aggregate(count ~ City + run, df2, function(x) paste(x[1], length(x), sep = ':'))
#> City run count
#> 1 London 1 1:1
#> 2 Mexico 2 1:1
#> 3 London 3 1:2
#> 4 London 4 0:1
#> 5 London 5 1:1
#> 6 Mexico 6 1:1
#> 7 London 7 0:2
#> 8 London 8 1:1
Upvotes: 5
Reputation: 5017
Try this solution:
Define a function using rle
:
f<-function(x,df)
{
var<-rle(df[df$City==x,"count"])
out<-data.frame(x,cbind(paste(var$values,var$lengths,sep=":")))
return(out)
}
Apply it on every city value using lapply
library("dplyr")
df_out<-suppressWarnings(bind_rows(lapply(as.character(unique(df$City)),f,df=df)))
colnames(df_out)<-c("City","count")
Your desired output:
df_out
City count
1 Mexico 1:2
2 London 0:1
3 London 1:2
Upvotes: 1
Reputation: 5893
df <- data.frame(city = c("Mexico", "Mexico", "London", "London", "London"),
count = c(1, 1, 0, 1, 1))
r <- rle(df$count)
df <- df[!duplicated(df), ]
df$count <- paste0(r$values, ":", r$lengths)
city count
1 Mexico 1:2
3 London 0:1
4 London 1:2
Upvotes: 2
Reputation: 269654
Try aggregate
: We define n
as a synonym for count
so that it does not get confused with it on both sides:
aggregate(count ~ City + n, transform(DF, n = count),
function(x) paste0(x[1], ":", length(x)))
giving:
City n count
1 London 0 0:1
2 London 1 1:2
3 Mexico 1 1:2
Lines <- "City, count
Mexico, 1
Mexico, 1
London, 0
London, 1
London, 1"
DF <- read.csv(text = Lines, as.is = TRUE)
Upvotes: 2
Reputation: 12074
Here's a solution that uses dplyr
. I use count
to count the instances of each city-number combination. Then, I stick together the number and the count of that number with a colon separator.
library(dplyr)
df <- data.frame(city = c("Mexico", "Mexico", "London", "London", "London"),
nums = c(1, 1, 0, 1, 1))
df %>%
count(city, nums) %>%
mutate(stuff = paste(nums, n, sep = ":")) %>%
select(city, stuff)
# # A tibble: 3 x 2
# city stuff
# <fct> <chr>
# 1 London 0:1
# 2 London 1:2
# 3 Mexico 1:2
Upvotes: 2