Reputation: 330
I am working with R.
Here I share a sample of my data...
structure(list(column_a = c("1_1", "1_1", "1_2", "1_2", "1_2",
"2_1", "2_2", "2_2", "3_1", "3_2"), column_b = c("kitchen", "tree",
"hate", "kind", "table", "dog", "human", "car", "moon", "rage"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
column_a column_b
1 1_1 kitchen
2 1_1 tree
3 1_2 hate
4 1_2 kind
5 1_2 table
6 2_1 dog
7 2_2 human
8 2_2 car
9 3_1 moon
10 3_2 rage
I need to calculate the mean value of the total words produced by condition (1_1, 1_2, etc.). The only problem that I have is that the conditions that finish with "_1" have a total number of 50 and the conditions that end with "_2" have a total number of 100.
So, since the condition "1_1" has two words produced (in the sample) I should calculate the mean by diving by 50. This is 2/50 = 0.04. However, when calculating the mean for the "1_2" condition I need to divide by 100. This is 3/100 = 0.03.
I need to create a column that has the mean value of the total words produced by condition, considering that some condition needs to be calculated by 50 and the other ones by a 100. How can I do this and still have the results in the same column?
Upvotes: 3
Views: 70
Reputation: 102770
A base R option using ave
+ endsWith
transform(
df,
value = ave(column_a,
column_a,
FUN = function(x) length(x) / ifelse(endsWith(x, "_1"), 50, 100)
)
)
gives
column_a column_b value
1 1_1 kitchen 0.04
2 1_1 tree 0.04
3 1_2 hate 0.03
4 1_2 kind 0.03
5 1_2 table 0.03
6 2_1 dog 0.02
7 2_2 human 0.02
8 2_2 car 0.02
9 3_1 moon 0.02
10 3_2 rage 0.01
Upvotes: 2
Reputation: 887951
Using data.table
library(data.table)
setDT(df)[, n := .N, column_a][, .(mean_group = mean(n)), by = column_a]
Upvotes: 2
Reputation: 79286
Just count by group and calculate the mean:
df %>%
group_by(column_a) %>%
add_count() %>%
summarise(mean_group = mean(n))
Output:
column_a mean_group
<chr> <dbl>
1 1_1 2
2 1_2 3
3 2_1 1
4 2_2 2
5 3_1 1
6 3_2 1
Upvotes: 6
Reputation: 389325
You can use case_when
to include a new column (divide
) which is either 50 or 100 and for divide number of rows in the group (column_a
) by it.
library(dplyr)
df %>%
mutate(divide = case_when(endsWith(column_a, '_1') ~ 50,
endsWith(column_a, '_2') ~ 100)) %>%
group_by(column_a) %>%
mutate(value = n()/divide) %>%
ungroup
# column_a column_b divide value
# <chr> <chr> <dbl> <dbl>
# 1 1_1 kitchen 50 0.04
# 2 1_1 tree 50 0.04
# 3 1_2 hate 100 0.03
# 4 1_2 kind 100 0.03
# 5 1_2 table 100 0.03
# 6 2_1 dog 50 0.02
# 7 2_2 human 100 0.02
# 8 2_2 car 100 0.02
# 9 3_1 moon 50 0.02
#10 3_2 rage 100 0.01
Also similar with add_count
-
library(dplyr)
df %>%
mutate(divide = case_when(endsWith(column_a, '_1') ~ 50,
endsWith(column_a, '_2') ~ 100)) %>%
add_count(column_a) %>%
mutate(value = n/divide)
ungroup
Upvotes: 4