Nikhil Kakdiya
Nikhil Kakdiya

Reputation: 21

is there a way i can create a cumulative distribution table in R, as a data frame and not histograms or graphs

I am trying to make a distribution table for a data frame in R as a data frame.

For eg.

Data frame looks like this:

A B    Count
1 0.5  1
2 0.6. 1 
3 0.75 1
4 0.77 1
5 0.88 1

I want to create a dataframe that looks like this:

D     E(Count) 
<0.4  0
<0.5  1
<0.6  2
<0.7  2
<0.8  4
<0.9  5
<1.0  5
<1.1  5

Let me know, how i should approach, since I have 200 such ranges to be made cumulative distribution for.

Upvotes: 0

Views: 97

Answers (4)

ThomasIsCoding
ThomasIsCoding

Reputation: 102609

A data.table option using non-equi join

setDT(df)[
    data.table(b = seq(0.4, 1.1, 0.1)),
    on = .(B <= b)
][
    , .(E = sum(Count, na.rm = TRUE)), B
]

gives

     B E
1: 0.4 0
2: 0.5 1
3: 0.6 2
4: 0.7 2
5: 0.8 4
6: 0.9 5
7: 1.0 5
8: 1.1 5

Upvotes: 0

akrun
akrun

Reputation: 887781

This can done simply in a single line

library(tibble)
tibble(D = seq(0.4, 1.1, by = 0.1), Count = findInterval(D, df1$B))
# A tibble: 8 x 2
      D Count
  <dbl> <int>
1   0.4     0
2   0.5     1
3   0.6     2
4   0.7     2
5   0.8     4
6   0.9     5
7   1       5
8   1.1     5

Or we may use cut

library(dplyr)
library(tidyr)
df1 %>%
    mutate(D = cut(B, breaks = c(-Inf, seq(0.4, 1, by = 0.1)),
        labels = seq(0.4, 1, by = 0.1))) %>%
    complete(D = as.character(seq(0.4, 1.1, by = 0.1)), fill = list(Count = 0)) %>%
    transmute(D, Count = cumsum(Count))) %>% 
    filter(!duplicated(D, fromLast = TRUE))

-output

# A tibble: 8 x 2
  D     Count
  <chr> <dbl>
1 0.4       0
2 0.5       1
3 0.6       2
4 0.7       2
5 0.8       4
6 0.9       5
7 1         5
8 1.1       5

data

df1 <- structure(list(A = 1:5, B = c(0.5, 0.6, 0.75, 0.77, 0.88), Count = c(1L, 
1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -5L
))

Upvotes: 0

d.b
d.b

Reputation: 32558

d = data.frame(D = seq(0.4, 1.1, 0.1))
d$E = floor(approx(df1$B, cumsum(df1$Count), d$D, yleft = 0, yright = sum(df1$Count))$y)
d
#    D E
#1 0.4 0
#2 0.5 1
#3 0.6 2
#4 0.7 2
#5 0.8 4
#6 0.9 5
#7 1.0 5
#8 1.1 5

Upvotes: 1

Onyambu
Onyambu

Reputation: 79338

You could also do:

s <- hist(df$B, plot = F,breaks = seq(0.3, 1.1, 0.1))[c('breaks', 'counts')]
s$breaks <- s$breaks[-1]           
transform(s, cumcount = cumsum(counts))  


  breaks counts cumcount
1    0.4      0        0
2    0.5      1        1
3    0.6      1        2
4    0.7      0        2
5    0.8      2        4
6    0.9      1        5
7    1.0      0        5
8    1.1      0        5
  

Upvotes: 1

Related Questions