Reputation: 3
I have a dataframe where the columns represent patients of various ages, and another dataframe with the values of those ages. I want to subset the data such that patients only below the age of 50 are displayed
> dat
GSM27015.26.M GSM27016.26.M GSM27018.29.M GSM27021.37.M GSM27023.40.M GSM27024.42.M
31307_at 179.86300 106.495000 265.58600 301.24300 218.50900 224.61000
31308_at 559.07800 411.483000 481.17600 570.73300 333.53900 370.07900
31309_r_at 20.76970 30.641500 50.21530 42.68920 27.10590 21.57620
31310_at 154.19100 224.446000 188.82300 177.86300 233.46300 120.90800
31311_at 956.79700 648.310000 933.65600 1016.41000 762.01300 1040.29000
And the annotation file with the ages of the patients
> ann
Gender Age
GSM27015 M 26
GSM27016 M 26
GSM27018 M 29
GSM27021 M 37
GSM27023 M 40
GSM27024 M 42
GSM27025 M 45
GSM27027 M 52
GSM27028 M 53
Upvotes: 0
Views: 74
Reputation: 886938
An option with parse_number
library(stringr)
dat[readr::parse_number(str_remove(names(dat), "^[^.]+\\.")) < 50]
Upvotes: 0
Reputation: 30474
Here's something else to consider.
You could transpose your data, so that patients are rows and not columns. As it looks like you have age and gender in your column names, you can also make these additional columns as well.
dat_new <- cbind(do.call(rbind, strsplit(colnames(dat), '\\.')), as.data.frame(t(dat)))
colnames(dat_new)[1:3] <- c("id", "age", "gender")
rownames(dat_new) <- NULL
This is what it would look like:
id age gender 31307_at 31308_at 31309_r_at 31310_at 31311_at
1 GSM27015 26 M 179.863 559.078 20.7697 154.191 956.797
2 GSM27016 26 M 106.495 411.483 30.6415 224.446 648.310
3 GSM27018 29 M 265.586 481.176 50.2153 188.823 933.656
4 GSM27021 37 M 301.243 570.733 42.6892 177.863 1016.410
5 GSM27023 40 M 218.509 333.539 27.1059 233.463 762.013
6 GSM27024 42 M 224.610 370.079 21.5762 120.908 1040.290
Then, if you wish to subset based on age (e.g., <= 50 years), you can do:
dat_new[dat_new$age <= 50, ]
Upvotes: 1
Reputation: 101024
Perhaps try
dat[as.numeric(gsub(".*?\\.(\\d+)\\..*","\\1",names(dat)))<50]
Upvotes: 0
Reputation: 11584
Does this work:
> library(dplyr)
> data
GSM27015.26.M GSM27016.26.M GSM27018.29.M GSM27021.37.M GSM27023.40.M GSM27024.42.M GSM27024.52.M
31307_at 179.8630 106.4950 265.5860 301.2430 218.5090 224.6100 331.230
31308_at 559.0780 411.4830 481.1760 570.7330 333.5390 370.0790 370.079
31309_r_at 20.7697 30.6415 50.2153 42.6892 27.1059 21.5762 98998.000
31310_at 154.1910 224.4460 188.8230 177.8630 233.4630 120.9080 120.908
31311_at 956.7970 648.3100 933.6560 1016.4100 762.0130 1040.2900 1000.290
> data %>% select_if(as.numeric(gsub('GSM\\d{5}\\.(\\d{2})..','\\1',names(data))) < 50)
GSM27015.26.M GSM27016.26.M GSM27018.29.M GSM27021.37.M GSM27023.40.M GSM27024.42.M
31307_at 179.8630 106.4950 265.5860 301.2430 218.5090 224.6100
31308_at 559.0780 411.4830 481.1760 570.7330 333.5390 370.0790
31309_r_at 20.7697 30.6415 50.2153 42.6892 27.1059 21.5762
31310_at 154.1910 224.4460 188.8230 177.8630 233.4630 120.9080
31311_at 956.7970 648.3100 933.6560 1016.4100 762.0130 1040.2900
>
So I added one more column to your data "GSM27024.52.M" and in the select output, it wasn't selected.
Upvotes: 0