Reputation: 21400
I have a df containing words (columns w1, w2, etc.) and their durations, some of which are NA (columns d1, d2, etc.), like this one:
set.seed(47)
df <- data.frame(
w1 = c(sample(LETTERS[1:4], 10, replace = T)),
w2 = c(sample(LETTERS[1:4], 10, replace = T)),
w3 = c(sample(LETTERS[1:4], 10, replace = T)),
w4 = c(sample(LETTERS[1:4], 10, replace = T)),
d1 = c(rep(NA, 3), round(rnorm(7),3)),
d2 = c(round(rnorm(6),3), NA, round(rnorm(3),3)),
d3 = c(round(rnorm(2),3), rep(NA,2), round(rnorm(6),3)),
d4 = c(round(rnorm(1),3), NA, round(rnorm(8),3))
)
w1 w2 w3 w4 d1 d2 d3 d4
1 D A A C NA -2.322 -0.693 -0.488
2 B C C B NA -1.967 0.261 NA
3 D A C B NA 0.028 NA -0.92
4 D C A A -1.566 0.484 NA 0.898
5 C C C D 0.249 0.144 0.507 -0.356
6 C D B B -0.34 -1.2 0.564 1.032
7 B B A A 0.417 NA 0.061 0.664
8 B A A D -0.326 0.885 -0.109 0.97
9 C A C B -0.89 0.887 -0.155 1.676
10 D B D C -1.608 0.001 0.95 1.988
What I'd like to get is a single frequency list of all those word tokens that are not NA in the corresponding duration column. So, for example, "D" in Column w1
is NA in d1
so this token should not be included in the frequency count.
How is this programmed in base R, ideally in a single line of code?
Upvotes: 0
Views: 40
Reputation: 145755
Ignoring values that are NA
in their corresponding columns:
table(unlist(replace(df[paste0("w", 1:4)], is.na(df[paste0("d", 1:4)]), NA)))
# B C D A
# 7 11 6 9
# Alternate approach
table(unlist(df[1:4])[!is.na(unlist(df[5:8]))])
# B C D A
# 7 11 6 9
Completely omitting values that have NA anywhere:
It's 3 lines, but I'd do it like this:
all_words = unlist(df[1:4])
na_words = all_words[is.na(unlist(df[5:8]))]
table(droplevels(all_words[! all_words %in% na_words]))
# < table of extent 0 >
You could do it in a single line, but it's much uglier, very hard to tell what's going on.
table(droplevels(unlist(df[1:4])[! unlist(df[1:4]) %in% unlist(df[1:4])[is.na(unlist(df[5:8]))]]))
For the given sample data, it gives a table of length 0 because all of the unique words have an NA
somewhere. If you change the input data to use more letters, we get non-empty results:
set.seed(47)
df2 <- data.frame(
w1 = c(sample(LETTERS[1:8], 10, replace = T)),
w2 = c(sample(LETTERS[1:8], 10, replace = T)),
w3 = c(sample(LETTERS[1:8], 10, replace = T)),
w4 = c(sample(LETTERS[1:8], 10, replace = T)),
d1 = c(rep(NA, 3), round(rnorm(7),3)),
d2 = c(round(rnorm(6),3), NA, round(rnorm(3),3)),
d3 = c(round(rnorm(2),3), rep(NA,2), round(rnorm(6),3)),
d4 = c(round(rnorm(1),3), NA, round(rnorm(8),3))
)
table(droplevels(unlist(df2[1:4])[! unlist(df2[1:4]) %in% unlist(df2[1:4])[is.na(unlist(df2[5:8]))]]))
# F A
# 5 4
Upvotes: 1