Jassy.W
Jassy.W

Reputation: 539

How to filter data in a raw data set without specific variables for each column

I have a raw dataset looks like this:

a619    a6641   a6672   a6741   a686    a6876   a689    a6946   a691
a6976   a40     a4019   b409    b4147   b4111   b416    b4167   b4178
b4186   b4198   b421    b4261   b4211   b4266   b4614   t4641   t4667
t4677   t4681   t4466   t4161   t4149   t4170   t4602   t4664   t461    
t4691t  t4764   t4767   f4792   f4948   f4988   f1086   f1168   f1184       
f1189   f1207   f1222   f1691   f1429   k1468   k1467   k1162   k1149   
k1619   k1666   k1669   k1767   k1719   k1772   k1776   k1782   p1827   
p1872   p1914   p1921   p1914   p1992   p6      p6094   p6106   p6164   
p6114   p6261   w6627   w6671   w6416   w6466   w6469   w6171   w6194
w6666   w6884   w6911   w7      w70     w7016   g7011   g7076   g7091   
g7164   g7191   g7266   g7621   g7406   g7426   g7426   g7467   g7106

Put the raw data in a data.txt and try the followwing codes to construct them into a dataframe:

 library(data.table)
 data <- fread("C:\\Desktop\\data.txt", header = F) 

My desired output is to pick out the elements with 'k' as the first letter:

k1468   k1467   k1162   k1149   k1619   k1666   k1669   k1767   k1719   k1772   k1776   k1782

I am There is no specific variables corresponding to each column. For this raw data, the only feature I found is that they have different first letter for different chunks. I want to extract the data that the first letter is 'k', that is from k1467 to k1782. I am wondering what syntax can achieve this in R?

Upvotes: 1

Views: 92

Answers (1)

Pul_P
Pul_P

Reputation: 90

Since you want a vector of required values, try converting your matrix into a vector and then do an sapply as below:

d<-c();
sapply(as.vector(your_data_matrix), function(x) { if (substr(x, 1, 1) == 'k') { d <<- c(d, x); }}, USE.NAMES = FALSE);

Your required output will be stored in d.

EDIT: For a data.table you will have to unlist and then do the sapply as follows:

d<-c();
sapply(as.vector(unlist(your_data_table)), function(x) { if (substr(x, 1, 1) == 'k') { d <<- c(d, x); }}, USE.NAMES = FALSE);

Upvotes: 1

Related Questions