user10072578
user10072578

Reputation:

Selecting columns with only one character in R

Here is my data

df<-read.table(text="A1 A2  AA2 A3  APP3    AA4 A4
17  17  14  18  18  14  17
16  15  13  16  19  15  19
               17   14  12  19  15  18  14
               17   16  16  18  19  19  20
               19   18  12  18  13  17  17
               12   19  17  18  16  20  18
               20   18  14  13  15  15  16
               18   20  12  20  12  12  18
               12   15  18  14  16  18  18",h=T)

I want to select columns that have only one A, i.e.,

A1  A2  A3  A4
17  17  18  17
16  15  16  19
17  14  19  14
17  16  18  20
19  18  18  17
12  19  18  18
20  18  13  16
18  20  20  18
12  15  14  18

I have used the following code:

df1<- df%>% 
  select(contains("A"))

but it gives me all As that start with A

Is it possible to get table 2? Thanks for your help.

Upvotes: 1

Views: 609

Answers (3)

Oriol Prat
Oriol Prat

Reputation: 1047

If you are not familiar with regular expressions, you can use a function of the popular package for analysing strings: stringr. With one line you get this:

library(stringr)
df[,str_count(names(df),'A')==1]

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 146164

You can use matches() with a regex pattern. A pattern for "contains exactly 1 'A'" would be this "^[^A]*A[^A]*$"

df %>% select(matches("^[^A]*A[^A]*$"))
#   A1 A2 A3 A4
# 1 17 17 18 17
# 2 16 15 16 19
# 3 17 14 19 14
# 4 17 16 18 20
# ...

Based on comments, my best guess for what you want is columns where the name starts with a P and after the P contains only numbers:

# single P followed by numbers
df %>% select(matches("^P[0-9]+$"))

# single A followed by numbers
df %>% select(matches("^A[0-9]+$"))

# single capital letter followed by numbers
df %>% select(matches("^[A-Z][0-9]+$"))

Upvotes: 1

M.Bergen
M.Bergen

Reputation: 174

If your not very comfortable with RegEx here's an alternative solution,

The first step is to create a function that counts the number of "A"s in a vector of strings, I will do this by creating a temporary vector of columns names with all the As removed and then subtracting the new number of characters from the original.

count_a<-function(vector,char){
  vec2<-gsub("A","",vector, fixed=T)
  numb_As<-nchar(vector)-nchar(vec2)
  return(numb_As)
}

Once you have this function you simply apply it to the colnames of your dataset and then limit your data to the columns where the count is equal to one.

As<-count_a(colnames(df))
df[,As==1]

Upvotes: 1

Related Questions