Reputation: 23

Need help creating a column based on calculations on three other columns in R

I have a data file like this:

structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("2020-07-26 00:00:00|Monitor1.txt|01",
"2020-07-26 00:00:00|Monitor1.txt|02", "2020-07-26 00:00:00|Monitor1.txt|03",
"2020-07-26 00:00:00|Monitor1.txt|04", "2020-07-26 00:00:00|Monitor1.txt|05",
"2020-07-26 00:00:00|Monitor1.txt|06", "2020-07-26 00:00:00|Monitor1.txt|07",
"2020-07-26 00:00:00|Monitor1.txt|08", "2020-07-26 00:00:00|Monitor1.txt|09",
"2020-07-26 00:00:00|Monitor1.txt|10", "2020-07-26 00:00:00|Monitor1.txt|11",
"2020-07-26 00:00:00|Monitor1.txt|12", "2020-07-26 00:00:00|Monitor1.txt|13",
"2020-07-26 00:00:00|Monitor1.txt|14", "2020-07-26 00:00:00|Monitor1.txt|15",
"2020-07-26 00:00:00|Monitor1.txt|16", "2020-07-26 00:00:00|Monitor1.txt|17",
"2020-07-26 00:00:00|Monitor1.txt|18", "2020-07-26 00:00:00|Monitor1.txt|19",
"2020-07-26 00:00:00|Monitor1.txt|20", "2020-07-26 00:00:00|Monitor1.txt|21",
"2020-07-26 00:00:00|Monitor1.txt|22", "2020-07-26 00:00:00|Monitor1.txt|23",
"2020-07-26 00:00:00|Monitor1.txt|24", "2020-07-26 00:00:00|Monitor1.txt|25",
"2020-07-26 00:00:00|Monitor1.txt|26", "2020-07-26 00:00:00|Monitor1.txt|27",
"2020-07-26 00:00:00|Monitor1.txt|28", "2020-07-26 00:00:00|Monitor1.txt|29",
"2020-07-26 00:00:00|Monitor1.txt|30", "2020-07-26 00:00:00|Monitor1.txt|31",
"2020-07-26 00:00:00|Monitor1.txt|32"), class = "factor"), t = c(60,
120, 180, 240, 300, 360), activity = c(0L, 0L, 0L, 0L, 0L, 0L
), moving = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), asleep = c(TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE), Day = c(1, 1, 1, 1, 1, 1)), row.names = c(NA,
-6L), class = c("behavr", "data.table", "data.frame"), sorted = "id", .internal.selfref = <pointer: 0x0000019c94541ef0>, metadata = structure(list(
id = structure(1L, .Label = c("2020-07-26 00:00:00|Monitor1.txt|01",
"2020-07-26 00:00:00|Monitor1.txt|02", "2020-07-26 00:00:00|Monitor1.txt|03",
"2020-07-26 00:00:00|Monitor1.txt|04", "2020-07-26 00:00:00|Monitor1.txt|05",
"2020-07-26 00:00:00|Monitor1.txt|06", "2020-07-26 00:00:00|Monitor1.txt|07",
"2020-07-26 00:00:00|Monitor1.txt|08", "2020-07-26 00:00:00|Monitor1.txt|09",
"2020-07-26 00:00:00|Monitor1.txt|10", "2020-07-26 00:00:00|Monitor1.txt|11",
"2020-07-26 00:00:00|Monitor1.txt|12", "2020-07-26 00:00:00|Monitor1.txt|13",
"2020-07-26 00:00:00|Monitor1.txt|14", "2020-07-26 00:00:00|Monitor1.txt|15",
"2020-07-26 00:00:00|Monitor1.txt|16", "2020-07-26 00:00:00|Monitor1.txt|17",
"2020-07-26 00:00:00|Monitor1.txt|18", "2020-07-26 00:00:00|Monitor1.txt|19",
"2020-07-26 00:00:00|Monitor1.txt|20", "2020-07-26 00:00:00|Monitor1.txt|21",
"2020-07-26 00:00:00|Monitor1.txt|22", "2020-07-26 00:00:00|Monitor1.txt|23",
"2020-07-26 00:00:00|Monitor1.txt|24", "2020-07-26 00:00:00|Monitor1.txt|25",
"2020-07-26 00:00:00|Monitor1.txt|26", "2020-07-26 00:00:00|Monitor1.txt|27",
"2020-07-26 00:00:00|Monitor1.txt|28", "2020-07-26 00:00:00|Monitor1.txt|29",
"2020-07-26 00:00:00|Monitor1.txt|30", "2020-07-26 00:00:00|Monitor1.txt|31",
"2020-07-26 00:00:00|Monitor1.txt|32"), class = "factor"),
file_info = list(list(path = "C:/Users/ariji/Desktop/ShinyWrapperForCircadianAnalysis/Monitor1.txt",
file = "Monitor1.txt")), region_id = 1L, experiment_id = "2020-07-26 00:00:00|Monitor1.txt",
start_datetime = structure(1595721600, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), stop_datetime = structure(1596326400, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), genotype = "Early", replicate = 1L,
uid = 1L), sorted = "id", class = c("data.table", "data.frame"
), row.names = c(NA, -1L), .internal.selfref = <pointer: 0x0000019c94541ef0>))

Full file here - https://anonymousfiles.io/1vypEs9u/ (read in with fread). What I need to do is - make another column to this data.table called noramct. The values in noramct should be (activity/sum of all activity for that day (1,2,3...7). This has to be done by each individual by the column id. So basically, for every id, I want a normalized activity (that particular id's activity divided by that particular day's activity of that particular id). Remember the sum of one day's activity has two levels, all activity from one particular id and one particular day, it may be confusing because one day's activity will have counts from multiple ids. Any help will be greatly appreciated! Thanks in anticipation. Question has been asked in Reddit also.

Upvotes: 0

Answers (2)

cryptodice

Reputation: 23

Answer from mrmogel on reddit which solved my question first:

You can do this in many ways, but with tidyverse this sounds like a job for group_by.

myData %>%
  group_by(id, Day) %>%
  mutate(activityNorm = activity/sum(activity)) %>%
  ungroup()

What this does is effectively splits your data into groups by id and Day, and then does the mutate by each group. That means the sum will be unique for each id and Day.

Alternatively in data.table (note: my data.table skills are a little rusty).

myData[, activityNorm := activity/sum(activity), by = .(id, Day)]

Upvotes: 0

pheymanss

Reputation: 152

Not sure if this is what you meant, but I think it suffices to first create the denominator for each observation (I understand it is the total activity of its corresponding id and day), and then simply divide each value by its corresponding denominator. Luckily, this is quite straightforward in data.table:

data[, day_activity_for_id := sum(activity), by = .(id, day)
   ][, noramct := activity/day_activity_for_id]

Also, friendly advice for next time: it is much easier to understand your problem if you show us a print of the head of the table, not it's cumbersome structure! data.table prints it quite cleanly in your console

head(data)

Upvotes: 1

Need help creating a column based on calculations on three other columns in R

Answers (2)

Related Questions