Reputation: 23
I have a data file like this:
structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("2020-07-26 00:00:00|Monitor1.txt|01",
"2020-07-26 00:00:00|Monitor1.txt|02", "2020-07-26 00:00:00|Monitor1.txt|03",
"2020-07-26 00:00:00|Monitor1.txt|04", "2020-07-26 00:00:00|Monitor1.txt|05",
"2020-07-26 00:00:00|Monitor1.txt|06", "2020-07-26 00:00:00|Monitor1.txt|07",
"2020-07-26 00:00:00|Monitor1.txt|08", "2020-07-26 00:00:00|Monitor1.txt|09",
"2020-07-26 00:00:00|Monitor1.txt|10", "2020-07-26 00:00:00|Monitor1.txt|11",
"2020-07-26 00:00:00|Monitor1.txt|12", "2020-07-26 00:00:00|Monitor1.txt|13",
"2020-07-26 00:00:00|Monitor1.txt|14", "2020-07-26 00:00:00|Monitor1.txt|15",
"2020-07-26 00:00:00|Monitor1.txt|16", "2020-07-26 00:00:00|Monitor1.txt|17",
"2020-07-26 00:00:00|Monitor1.txt|18", "2020-07-26 00:00:00|Monitor1.txt|19",
"2020-07-26 00:00:00|Monitor1.txt|20", "2020-07-26 00:00:00|Monitor1.txt|21",
"2020-07-26 00:00:00|Monitor1.txt|22", "2020-07-26 00:00:00|Monitor1.txt|23",
"2020-07-26 00:00:00|Monitor1.txt|24", "2020-07-26 00:00:00|Monitor1.txt|25",
"2020-07-26 00:00:00|Monitor1.txt|26", "2020-07-26 00:00:00|Monitor1.txt|27",
"2020-07-26 00:00:00|Monitor1.txt|28", "2020-07-26 00:00:00|Monitor1.txt|29",
"2020-07-26 00:00:00|Monitor1.txt|30", "2020-07-26 00:00:00|Monitor1.txt|31",
"2020-07-26 00:00:00|Monitor1.txt|32"), class = "factor"), t = c(60,
120, 180, 240, 300, 360), activity = c(0L, 0L, 0L, 0L, 0L, 0L
), moving = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), asleep = c(TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE), Day = c(1, 1, 1, 1, 1, 1)), row.names = c(NA,
-6L), class = c("behavr", "data.table", "data.frame"), sorted = "id", .internal.selfref = <pointer: 0x0000019c94541ef0>, metadata = structure(list(
id = structure(1L, .Label = c("2020-07-26 00:00:00|Monitor1.txt|01",
"2020-07-26 00:00:00|Monitor1.txt|02", "2020-07-26 00:00:00|Monitor1.txt|03",
"2020-07-26 00:00:00|Monitor1.txt|04", "2020-07-26 00:00:00|Monitor1.txt|05",
"2020-07-26 00:00:00|Monitor1.txt|06", "2020-07-26 00:00:00|Monitor1.txt|07",
"2020-07-26 00:00:00|Monitor1.txt|08", "2020-07-26 00:00:00|Monitor1.txt|09",
"2020-07-26 00:00:00|Monitor1.txt|10", "2020-07-26 00:00:00|Monitor1.txt|11",
"2020-07-26 00:00:00|Monitor1.txt|12", "2020-07-26 00:00:00|Monitor1.txt|13",
"2020-07-26 00:00:00|Monitor1.txt|14", "2020-07-26 00:00:00|Monitor1.txt|15",
"2020-07-26 00:00:00|Monitor1.txt|16", "2020-07-26 00:00:00|Monitor1.txt|17",
"2020-07-26 00:00:00|Monitor1.txt|18", "2020-07-26 00:00:00|Monitor1.txt|19",
"2020-07-26 00:00:00|Monitor1.txt|20", "2020-07-26 00:00:00|Monitor1.txt|21",
"2020-07-26 00:00:00|Monitor1.txt|22", "2020-07-26 00:00:00|Monitor1.txt|23",
"2020-07-26 00:00:00|Monitor1.txt|24", "2020-07-26 00:00:00|Monitor1.txt|25",
"2020-07-26 00:00:00|Monitor1.txt|26", "2020-07-26 00:00:00|Monitor1.txt|27",
"2020-07-26 00:00:00|Monitor1.txt|28", "2020-07-26 00:00:00|Monitor1.txt|29",
"2020-07-26 00:00:00|Monitor1.txt|30", "2020-07-26 00:00:00|Monitor1.txt|31",
"2020-07-26 00:00:00|Monitor1.txt|32"), class = "factor"),
file_info = list(list(path = "C:/Users/ariji/Desktop/ShinyWrapperForCircadianAnalysis/Monitor1.txt",
file = "Monitor1.txt")), region_id = 1L, experiment_id = "2020-07-26 00:00:00|Monitor1.txt",
start_datetime = structure(1595721600, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), stop_datetime = structure(1596326400, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), genotype = "Early", replicate = 1L,
uid = 1L), sorted = "id", class = c("data.table", "data.frame"
), row.names = c(NA, -1L), .internal.selfref = <pointer: 0x0000019c94541ef0>))
Full file here - https://anonymousfiles.io/1vypEs9u/ (read in with fread
).
What I need to do is - make another column to this data.table called noramct
. The values in noramct
should be (activity
/sum of all activity
for that day (1,2,3...7). This has to be done by each individual by the column id
. So basically, for every id
, I want a normalized activity (that particular id
's activity
divided by that particular day's activity of that particular id
). Remember the sum of one day's activity has two levels, all activity
from one particular id and one particular day
, it may be confusing because one day's activity
will have counts from multiple id
s.
Any help will be greatly appreciated! Thanks in anticipation.
Question has been asked in Reddit also.
Upvotes: 0
Views: 57
Reputation: 23
Answer from mrmogel on reddit which solved my question first:
You can do this in many ways, but with tidyverse
this sounds like a job for group_by
.
myData %>%
group_by(id, Day) %>%
mutate(activityNorm = activity/sum(activity)) %>%
ungroup()
What this does is effectively splits your data into groups by id
and Day
, and then does the mutate
by each group. That means the sum will be unique for each id and Day.
Alternatively in data.table
(note: my data.table
skills are a little rusty).
myData[, activityNorm := activity/sum(activity), by = .(id, Day)]
Upvotes: 0
Reputation: 152
Not sure if this is what you meant, but I think it suffices to first create the denominator for each observation (I understand it is the total activity of its corresponding id and day), and then simply divide each value by its corresponding denominator. Luckily, this is quite straightforward in data.table:
data[, day_activity_for_id := sum(activity), by = .(id, day)
][, noramct := activity/day_activity_for_id]
Also, friendly advice for next time: it is much easier to understand your problem if you show us a print of the head of the table, not it's cumbersome structure! data.table prints it quite cleanly in your console
head(data)
Upvotes: 1