Reputation: 11
My name is Natasa, I’m new in R. I’m impressed by what R can do, but unfortunately I don’t have the time to learn it from the beginning.
I have a lot of vectors (11) with 10000 values/numbers each, so I will be using a more “compact” version. Let’s say that I have 4 vectors: Where TI=Time, and RE= Region (1, 2 or 3).
TI -> c(10, 20, 30, 40, 50, 100, 150, 200, 300)
RE1 -> c(0.25, 0.78, 0.35, 0.37, 4.56, 5.23, 3.75, 8.51, 10.85)
RE2 -> c(0.05, 1.54, 0.4, 0.42, 2.53, 1.38, 4.58, 10.54, 25.35)
RE3 -> c(0.02, 0.53, 0.72, 0.28, 7.82, 13.51, 23.54, 2.15)
I want to create groups of “TI” (Time series: group1= TI corresponding to 10, 20, 30 and 40, group2= between 50-150 and group3= 200 and 300) and compute the mean and stdev for each RE vector according to /depending on the groups of TI. Each group is of unequal length and I don’t know the number of “variables” in each group (only the “range”). My final goal is to create a grouped bar plot for each group of TI and for each RE vector. In x axis there will be the groups of TI (the time series) and in y axis “values” of the regions, where in each time series there will be a separate “histogram” for each region.
I have found on the internet several pages and I have tried several things, but without any success. My thoughts were:
The only problem is that I can’t found the correct way to split the table in the desired groups or in an “easy” way to rename specific values of TI (thought 2). Wanted table (If my "thoughts" are correct)
TI RE1 RE2 RE3
group1 0.25 0.05 0.02
group1 0.78 1.54 0.53
group1 0.35 0.4 0.72
group1 0.37 0.42 0.28
group2 4.56 2.53 7.82
group2 5.23 1.38 13.51
group2 3.75 4.58 23.54
group3 8.51 10.54 2.15
group3 10.85 25.35 0.65
Since my data is large, I don’t think that the replace function for each value is “affordable”. My other thought was to compute separately the mean and SD for each group of TI and RE and then to insert a column with the desire names of the group and then combine all the “tables” in one… but it will be very time consuming and not practical. Is there a way to “say” in R to rename all the numbers between 10-40 to group1, values between 50-150 to group2 etc. of the vector TI or that the numbers between… are a group etc.? If not, is there an easiest way to compute mean and sd for a specific range of values of a different vector? Or all those things aren’t needed and I can do it using the barplot function (I also tried to do it… without any success)?
It is really hard for me to figure it out with such limited experience, and any help will be greatly appreciated!! Thanks in advance for your responses.
Upvotes: 1
Views: 1052
Reputation: 1652
For picking out values in a group, the %in%
construct is handy, although Froom's suggestion with <
and >
is more robust.
a <- c(10, 13, 18, 21, 15, 32)
a %in% 10:20
# [1] TRUE TRUE TRUE FALSE TRUE FALSE
For summarizing and generally working with data, I would check out the data.table
package.
library(data.table)
data <- data.table(TI = c(10, 20, 30, 40, 50, 100, 150, 200, 300),
RE1 = c(0.25, 0.78, 0.35, 0.37, 4.56, 5.23, 3.75, 8.51, 10.85),
RE2 = c(0.05, 1.54, 0.4, 0.42, 2.53, 1.38, 4.58, 10.54, 25.35),
RE3 = c(0.02, 0.53, 0.72, 0.28, 7.82, 13.51, 23.54, 2.15, NA))
g1 <- 1:40
g2 <- 41:150
data[TI %in% g1, gp := "group1"]
data[TI %in% g2, gp := "group2"]
data[TI > 150, gp := "group3"]
data
# TI RE1 RE2 RE3 gp
# 1: 10 0.25 0.05 0.02 group1
# 2: 20 0.78 1.54 0.53 group1
# 3: 30 0.35 0.40 0.72 group1
# 4: 40 0.37 0.42 0.28 group1
# 5: 50 4.56 2.53 7.82 group2
# 6: 100 5.23 1.38 13.51 group2
# 7: 150 3.75 4.58 23.54 group2
# 8: 200 8.51 10.54 2.15 group3
# 9: 300 10.85 25.35 NA group3
The :=
performs an internal assignment, which can be used to reassign new values to an old column or create a new column. Basically the same thing as data$gp <- ...
. Also, as you may have noticed, a nice feature of data.table
s is that they implicitly use with
syntax; i.e. it knows you're talking about its columns and don't have to specify data$...
every time.
Then, summarizing is really easy.
data[, lapply(.SD, mean, na.rm=TRUE), by = gp, .SDcols=c("RE1", "RE2", "RE3")]
# gp RE1 RE2 RE3
# 1: group1 0.437500 0.6025 0.38750
# 2: group2 4.513333 2.8300 14.95667
# 3: group3 9.680000 17.9450 2.15000
This syntax is a little strange, but here's the gist: lapply(l, FUN, ...)
takes a list or vector (l
) and applies the function (FUN
) to every value of l
, with ...
as additional arguments to FUN
. Here, .SD
refers to the data.table
you're currently in (data
), so in words, that whole block is saying "apply function mean
with arguments na.rm=TRUE
to every column of the data.table
I'm working on"). by
allows you to subset based on a group (in this case, column gp
). Finally, .SDcols
indicates by name which columns to use in the .SD
. Omitting this causes .SD
to refer to the ENTIRE data.table
, which would fail here because the column gp
is a "character" vector (and the mean of column T1
is, I think, meaningless for your purposes).
Upvotes: 0
Reputation: 1279
If you want your groups to be unevenly split (as in your example) then the following may be helpful, although there is likely to be a slicker way of doing it...
I have used the package dplyr to get the summaries by group, which you would need to install if you haven't already got it.
data <- data.frame(TI = c(10, 20, 30, 40, 50, 100, 150, 200, 300),
RE1 = c(0.25, 0.78, 0.35, 0.37, 4.56, 5.23, 3.75, 8.51, 10.85),
RE2 = c(0.05, 1.54, 0.4, 0.42, 2.53, 1.38, 4.58, 10.54, 25.35),
RE3 = c(0.02, 0.53, 0.72, 0.28, 7.82, 13.51, 23.54, 2.15, NA))
data$gp <- NA
data$gp[data$TI > 0 & data$TI < 41] <- "g1"
data$gp[data$TI > 41 & data$TI < 151] <- "g2"
data$gp[data$TI > 151] <- "g3"
library(dplyr)
data <- group_by(data, gp)
summarise(data, mean(RE1, na.rm = TRUE), mean(RE2, na.rm = TRUE), mean(RE3, na.rm = TRUE))
summarise(data, sd(RE1, na.rm = TRUE), sd(RE2, na.rm = TRUE), sd(RE3, na.rm = TRUE))
Upvotes: 0