Reputation: 21
I am trying to make a distribution table for a data frame in R as a data frame.
For eg.
Data frame looks like this:
A B Count
1 0.5 1
2 0.6. 1
3 0.75 1
4 0.77 1
5 0.88 1
I want to create a dataframe that looks like this:
D E(Count)
<0.4 0
<0.5 1
<0.6 2
<0.7 2
<0.8 4
<0.9 5
<1.0 5
<1.1 5
Let me know, how i should approach, since I have 200 such ranges to be made cumulative distribution for.
Upvotes: 0
Views: 97
Reputation: 102609
A data.table
option using non-equi join
setDT(df)[
data.table(b = seq(0.4, 1.1, 0.1)),
on = .(B <= b)
][
, .(E = sum(Count, na.rm = TRUE)), B
]
gives
B E
1: 0.4 0
2: 0.5 1
3: 0.6 2
4: 0.7 2
5: 0.8 4
6: 0.9 5
7: 1.0 5
8: 1.1 5
Upvotes: 0
Reputation: 887781
This can done simply in a single line
library(tibble)
tibble(D = seq(0.4, 1.1, by = 0.1), Count = findInterval(D, df1$B))
# A tibble: 8 x 2
D Count
<dbl> <int>
1 0.4 0
2 0.5 1
3 0.6 2
4 0.7 2
5 0.8 4
6 0.9 5
7 1 5
8 1.1 5
Or we may use cut
library(dplyr)
library(tidyr)
df1 %>%
mutate(D = cut(B, breaks = c(-Inf, seq(0.4, 1, by = 0.1)),
labels = seq(0.4, 1, by = 0.1))) %>%
complete(D = as.character(seq(0.4, 1.1, by = 0.1)), fill = list(Count = 0)) %>%
transmute(D, Count = cumsum(Count))) %>%
filter(!duplicated(D, fromLast = TRUE))
-output
# A tibble: 8 x 2
D Count
<chr> <dbl>
1 0.4 0
2 0.5 1
3 0.6 2
4 0.7 2
5 0.8 4
6 0.9 5
7 1 5
8 1.1 5
df1 <- structure(list(A = 1:5, B = c(0.5, 0.6, 0.75, 0.77, 0.88), Count = c(1L,
1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -5L
))
Upvotes: 0
Reputation: 32558
d = data.frame(D = seq(0.4, 1.1, 0.1))
d$E = floor(approx(df1$B, cumsum(df1$Count), d$D, yleft = 0, yright = sum(df1$Count))$y)
d
# D E
#1 0.4 0
#2 0.5 1
#3 0.6 2
#4 0.7 2
#5 0.8 4
#6 0.9 5
#7 1.0 5
#8 1.1 5
Upvotes: 1
Reputation: 79338
You could also do:
s <- hist(df$B, plot = F,breaks = seq(0.3, 1.1, 0.1))[c('breaks', 'counts')]
s$breaks <- s$breaks[-1]
transform(s, cumcount = cumsum(counts))
breaks counts cumcount
1 0.4 0 0
2 0.5 1 1
3 0.6 1 2
4 0.7 0 2
5 0.8 2 4
6 0.9 1 5
7 1.0 0 5
8 1.1 0 5
Upvotes: 1