Reputation: 33
I have a large dataframe (my_DF) with 4 important columns: ID (1-->100), YEAR (2000, 2001, 2002, 2003, 2004, 2005), MONTH (January --> December), LENGHT (continuous values from 0.1 to 1.0). A thing like:
YEAR MONTH ID LENGHT
1 2000 january S1 0.2
2 2000 january S1 0.3
3 2000 january S1 0.1
4 2000 january S2 0.5
5 2000 january S2 0.3
6 2000 february S1 0.9
7 2000 february S1 0.4
8 2000 february S1 0.6
9 2000 february S3 0.4
10 2000 february S3 0.3
11 2000 march S1 0.7
...
I need to add a new column to the dataframe, filled with the median lenght for each unique situation, so for each ID, YEAR and MONTH values.
I succeded in obtain the values I was looking for, thanks to aggregate:
agg <- aggregate(my_DF["LENGHT"], by = list(my_DF$YEAR, my_DF$MONTH, my_DF$ID), median)
In this way I obtain the values I wanted, but, of course, this just create a new dataframe. I'm failing in finding a rapid way to paste the values of the dataframe "agg" in a new column of dataframe "my_DF", basing on the YEAR, MONTH and ID corrispondence.
For instance, I want obtain a thing like:
YEAR MONTH ID LENGHT MONTHLY_LENGHT_MEDIAN
1 2000 january S1 0.2 0.2
2 2000 january S1 0.3 0.2
3 2000 january S1 0.1 0.2
4 2000 january S2 0.5 0.4
5 2000 january S2 0.3 0.4
6 2000 february S1 0.9 0.6
7 2000 february S1 0.4 0.6
8 2000 february S1 0.6 0.6
9 2000 february S3 0.4 0.35
10 2000 february S3 0.3 0.35
11 2000 march S1 0.7 0.7
So, I'm wondering if a conditional command is suitable in my case (if, ifelse...).
Unfortunately, I'm not good with these commands... How could I do?
Thanks for helping!
Upvotes: 2
Views: 67
Reputation: 2764
You can simply do this in one line using data.table
library(data.table)
setDT(dt)[ , median := median(LENGHT), by = .(YEAR,MONTH,ID)]
> dt
YEAR MONTH ID LENGHT mean median
1: 2000 january S1 0.2 0.20 0.20
2: 2000 january S1 0.3 0.20 0.20
3: 2000 january S1 0.1 0.20 0.20
4: 2000 january S2 0.5 0.40 0.40
5: 2000 january S2 0.3 0.40 0.40
6: 2000 february S1 0.9 0.60 0.60
7: 2000 february S1 0.4 0.60 0.60
8: 2000 february S1 0.6 0.60 0.60
9: 2000 february S3 0.4 0.35 0.35
10: 2000 february S3 0.3 0.35 0.35
Upvotes: 1
Reputation: 887223
Instead of summarising with aggregate
and then merge
, directly use ave
from base R
to create a column
my_df$MONTHLY_LENGHT_MEDIAN <- with(my_df, ave(LENGHT, YEAR,
MONTH, ID, FUN = median))
mydf$MONTHLY_LENGHT_MEDIAN
#[1] 0.20 0.20 0.20 0.40 0.40 0.60 0.60 0.60 0.35 0.35
Or with tidyverse
library(tidyverse)
my_df %>%
group_by(LENGHT, YEAR, MONTH) %>%
mutate(MONTHLY_LENGHT_MEDIAN = median(LENGHT))
my_df <- structure(list(YEAR = c(2000L, 2000L, 2000L, 2000L, 2000L, 2000L,
2000L, 2000L, 2000L, 2000L), MONTH = c("january", "january",
"january", "january", "january", "february", "february", "february",
"february", "february"), ID = c("S1", "S1", "S1", "S2", "S2",
"S1", "S1", "S1", "S3", "S3"), LENGHT = c(0.2, 0.3, 0.1, 0.5,
0.3, 0.9, 0.4, 0.6, 0.4, 0.3)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
Upvotes: 2