Reputation: 435
This might be a basic question but I haven't found an answear that fits my needs even though there are many alike.
I am trying to select the top 3 values from each column (keeping the row number as an id) but I haven't been able to find the right function for it.
I have a matrix like this from the beggining, using that code to add the id column
top_probs <- doc_topic_distr %>%
magrittr::set_rownames(seq_len(nrow(.))) %>%
as_tibble(rownames = "id")
id V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 1 0.000000000 0.000000000 0.133333333 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.181481481 0.685185185
2 2 0.950000000 0.000000000 0.050000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
3 3 0.028571429 0.114285714 0.814285714 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.014285714 0.028571429
4 4 0.000000000 0.000000000 0.000000000 0.002127660 0.240425532 0.136170213 0.408510638 0.076595745 0.000000000 0.000000000 0.000000000 0.136170213
5 5 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.025000000 0.025000000 0.050000000 0.900000000
6 6 0.000000000 0.000000000 0.000000000 0.000000000 0.076923077 0.384615385 0.000000000 0.000000000 0.000000000 0.000000000 0.284615385 0.253846154
7 7 0.000000000 0.000000000 0.347826087 0.000000000 0.000000000 0.000000000 0.243478261 0.000000000 0.026086957 0.000000000 0.143478261 0.239130435
8 8 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.024000000 0.004000000 0.000000000 0.460000000 0.000000000 0.224000000 0.288000000
9 9 0.000000000 0.000000000 0.311111111 0.000000000 0.011111111 0.000000000 0.011111111 0.000000000 0.000000000 0.000000000 0.388888889 0.277777778
10 10 0.000000000 0.466666667 0.000000000 0.000000000 0.000000000 0.266666667 0.200000000 0.000000000 0.066666667 0.000000000 0.000000000 0.000000000
11 11 0.000000000 0.153333333 0.006666667 0.000000000 0.000000000 0.826666667 0.000000000 0.013333333 0.000000000 0.000000000 0.000000000 0.000000000
12 12 0.295833333 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.404166667 0.004166667 0.000000000 0.000000000 0.000000000 0.295833333
13 13 0.000000000 0.000000000 0.000000000 0.000000000 0.009090909 0.790909091 0.154545455 0.009090909 0.009090909 0.000000000 0.027272727 0.000000000
14 14 0.000000000 0.155555556 0.000000000 0.000000000 0.000000000 0.033333333 0.033333333 0.011111111 0.000000000 0.533333333 0.011111111 0.222222222
15 15 0.055555556 0.000000000 0.533333333 0.000000000 0.000000000 0.000000000 0.177777778 0.005555556 0.000000000 0.000000000 0.227777778 0.000000000
16 16 0.000000000 0.153333333 0.006666667 0.000000000 0.000000000 0.826666667 0.000000000 0.013333333 0.000000000 0.000000000 0.000000000 0.000000000
17 17 0.295833333 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.404166667 0.004166667 0.000000000 0.000000000 0.000000000 0.295833333
18 18 0.000000000 0.000000000 0.000000000 0.000000000 0.009090909 0.790909091 0.154545455 0.009090909 0.009090909 0.000000000 0.027272727 0.000000000
19 19 0.000000000 0.155555556 0.000000000 0.000000000 0.000000000 0.033333333 0.033333333 0.011111111 0.000000000 0.533333333 0.011111111 0.222222222
20 20 0.055555556 0.000000000 0.533333333 0.000000000 0.000000000 0.000000000 0.177777778 0.005555556 0.000000000 0.000000000 0.227777778 0.000000000
Now, I want to know if there is a way to use top_frac()
based on every column, meaning like I want 20% of my data gathered by the same number of highest probability rows per columns. Like if the 20% of the whole data was a 120, then I would get a matrix merging the highest 10 probabilities for each column. It would be easy doing it based on a single column, but I don't know how to do it based proportionally on each one of them.
Upvotes: 0
Views: 509
Reputation: 336
Following up from above, it would be something like:
df %>%
gather(column, value, -id) %>%
group_by(id, column) %>%
top_n(3)
Upvotes: 2