user330
user330

Reputation: 1270

How I can sort this data out in R

My data is similar to these data. However, I have more than these columns

n1<-10
n2<-11
n3<-12
n4<-13
n5<-14

df<-read.table (text=" A1   A2  A3  A4  A5
        12  11  11  12  14
        13  10  13  14  12
        8   12  14  14  16
        16  14  16  12  13
        13  19  14  11  11
        ", header=TRUE)

I want to get the following table:

O1  O2  O3  O4  O5
Y   Y   N   N   Y
Y   N   Y   Y   N
N   Y   Y   Y   Y
Y   Y   Y   N   N
Y   Y   Y   N   N
1   1   1   3   3

The logic is that, for example, in the O1 column, if the values are >= n1, it gets Y, else N. For the O2 column, n2 is considered and n3 for O3 and n4 for O4 and n5 goes to O5. Finally, I want to get the count Ns for each column.

Thank you for your help in advance!

Upvotes: 1

Views: 77

Answers (3)

akrun
akrun

Reputation: 886998

We can also use a list

df[] <- c("N", "Y")[1+(df >= mget(ls(pattern = "^n\\d+$")))]

Or it can be also done by paste and mget

df[] <- c("N", "Y")[1+(df >= mget(paste0("n", 1:5)))]

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101159

A simple base R option using t

df[] <- c("N", "Y")[t(t(df) >= c(n1, n2, n3, n4, n5)) + 1]

gives

> df
  A1 A2 A3 A4 A5
1  Y  Y  N  N  Y
2  Y  N  Y  Y  N
3  N  Y  Y  Y  Y
4  Y  Y  Y  N  N
5  Y  Y  Y  N  N

and then you use rbind to get

> rbind(df, colSums(df == "N"))
  A1 A2 A3 A4 A5
1  Y  Y  N  N  Y
2  Y  N  Y  Y  N
3  N  Y  Y  Y  Y
4  Y  Y  Y  N  N
5  Y  Y  Y  N  N
6  1  1  1  3  3

Upvotes: 2

Jon Spring
Jon Spring

Reputation: 66415

I'm sure there's a more concise way to do this, but here's how I would think about it:

library(tidyverse)

First, convert your n's to a lookup table. I'd rather keep them all in one place rather than as separate variables. This also would make it more convenient if the column names are programmatically determined, e.g. we could use col = paste0("A", 1:5) here instead.

n <- tibble(col = c("A1", "A2", "A3", "A4", "A5"),
            n = c(10, 11, 12, 13, 14))

Then convert df to long format, join to n, do the comparison, and then pivot wide again:

df %>% 
  mutate(row = row_number()) %>%
  pivot_longer(cols = -row, names_to = "col", values_to = "val") %>%
  left_join(n) %>%
  mutate(greater = if_else(val >= n, "Y", "N")) %>%
  select(row, col, greater) %>%
  pivot_wider(names_from = col, values_from = greater) %>%
  select(-row)

Result

# A tibble: 5 x 5
  A1    A2    A3    A4    A5   
  <chr> <chr> <chr> <chr> <chr>
1 Y     Y     N     N     Y    
2 Y     N     Y     Y     N    
3 N     Y     Y     Y     Y    
4 Y     Y     Y     N     N    
5 Y     Y     Y     N     N

To add the totals in the last row, here's a way. Let's say the output above was saved to temp. Then:

bind_rows(
  temp, 
  temp %>% 
    summarize(
      across(
        everything(),
        ~sum(.x == "Y") %>%
          as.character())))

# A tibble: 6 x 5
  A1    A2    A3    A4    A5   
  <chr> <chr> <chr> <chr> <chr>
1 Y     Y     N     N     Y    
2 Y     N     Y     Y     N    
3 N     Y     Y     Y     Y    
4 Y     Y     Y     N     N    
5 Y     Y     Y     N     N    
6 4     4     4     2     2    

Upvotes: 1

Related Questions