Reputation: 277
Suppose i do have two data tables
df1
x y f(x,y)
1 a A 3
2 b E 4
3 a E 5
4 b A 2
and
df2
x y f(x,y)
1 a A 4
2 b E 4
3 a E 4
4 b A 2
If we interpret the columns x and y as the influence on some results, than we can say that in the second example (df2
) the outcome is independent of the column y for x = a. What i would like to do for generating a report is to drop all the columns which don't influence the outcome, hence i would like to create df2_out
instead of df2
(in order to avoid some large tables)
df2_out
x y f(x,y)
1 a - 4
2 b E 4
3 b A 2
whereas df1
should stay as it is since x,y has an influence on the outcome:
df1_out
x y f(x,y)
1 a A 3
2 b E 4
3 a E 5
4 b A 2
How can i achieve this in R? Is there any better way to print the data table?
Upvotes: 0
Views: 125
Reputation: 21274
Your expected output indicates you are only interested in adjusting cases where the outcome of f()
is independent of y
. You can use dplyr
methods to do this:
library(dplyr)
find_independent <- function(data) {
data %>%
inner_join(data %>%
group_by(x, f) %>%
count(),
by=c("x", "f")) %>%
mutate(y = if_else(n == 2, "_", y)) %>%
distinct()
}
find_independent(df1)
x y f
1 a A 3
2 b E 4
3 a E 5
4 b A 2
find_independent(df2)
x y f
1 a _ 4
2 b E 4
3 b A 2
Explanation (using df2
as an example):
First, group_by
x
and f
and count the number of occurrences.
df2 %>% group_by(x, f) %>% count()
# A tibble: 3 x 3
# Groups: x, f [3]
x f n
<chr> <int> <int>
1 a 4 2
2 b 2 1
3 b 4 1
Merge this count back to the original data frame, and for the rows where n == 2
, change the value of y
to _
.
y
has no effect on f
) using distinct()
.Data:
df1 <- structure(list(x = c("a", "b", "a", "b"), y = c("A", "E", "E",
"A"), f = c(3L, 4L, 5L, 2L)), class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(x = c("a", "b", "a", "b"), y = c("A", "E", "E",
"A"), f = c(4L, 4L, 4L, 2L)), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 2