iskandarblue
iskandarblue

Reputation: 7526

extracting numbers of certain length from list

I have a list of thousands of elements - some elements contain years - which are strings of 4 numbers - others contain random numbers that I need to get rid of.

I need to extract from the list only numbers that are 4 in length, and remove all other numbers. In the end I need a data frame of 20 rows - and columns containing the years that are nested in the list. For example, in the sample below I need a table that looks like this.

> sample_years
   element year year.1 year.2 year.3
1        1   NA     NA     NA     NA
2        2   NA   1918     NA     NA
3        3   NA     NA     NA     NA
4        4   NA     NA     NA     NA
5        5   NA   1912   1913     NA
6        6   NA   1893   1898   1925
7        7   NA   1820   1830   1899
8        8   NA     NA     NA     NA
9        9   NA   1808   1810   1854
10      10   NA     NA     NA     NA
11      11   NA     NA     NA     NA
12      12   NA   1885     NA     NA
13      13   NA   1900     NA     NA
14      14   NA   1926   1933     NA
15      15   NA     NA     NA     NA
16      16   NA     NA     NA     NA
17      17   NA   1870     NA     NA
18      18   NA     NA   1923     NA
19      19   NA     NA     NA     NA
20      20   NA     NA     NA     NA


> dput(sample)
list(c("", "2"), c("", "1918"), "", "", c("", "1912", "1913"), 
    c("", "1893", "1898", "1925", "1993"), c("", "1820", "1830", 
    "1899", "1900"), "", c("", "1808", "1810", "1854", "1905", 
    "1907"), "", "", c("", "1885"), c("", "1900"), c("", "1926", 
    "1933"), "", "", c("", "1870"), c("", "1", "1923"), "", "")

Upvotes: 0

Views: 135

Answers (2)

lmo
lmo

Reputation: 38500

I think sapplyis what you are looking for. For your list named sample:

sapply(sample, function(i) sum(i != ""))

You can then extract the elements of the list that fit your criterion as follows:

myNewSample <- sample[which(sapply(sample, function(i) sum(i != ""))) == 4]

On a sidenote, it is not advisable to use "sample" as the name of your list object as it is a fairly important function in R. See ?sample.

Upvotes: 1

Sotos
Sotos

Reputation: 51582

We can use rbind.fill from plyr package to bind the list, and then grepl to handle your condition,

df <- rbind.fill(lapply(lst1,function(i)as.data.frame(t(i))))
df[!apply(df, 1:2, function(i) grepl('[0-9]{4}', i))] <- NA
head(df)
#    V1   V2   V3   V4   V5   V6
#1 <NA> <NA> <NA> <NA> <NA> <NA>
#2 <NA> 1918 <NA> <NA> <NA> <NA>
#3 <NA> <NA> <NA> <NA> <NA> <NA>
#4 <NA> <NA> <NA> <NA> <NA> <NA>
#5 <NA> 1912 1913 <NA> <NA> <NA>
#6 <NA> 1893 1898 1925 1993 <NA>

Upvotes: 2

Related Questions