Reputation: 7526
I have a list of thousands of elements - some elements contain years - which are strings of 4 numbers - others contain random numbers that I need to get rid of.
I need to extract from the list only numbers that are 4 in length, and remove all other numbers. In the end I need a data frame of 20 rows - and columns containing the years that are nested in the list. For example, in the sample below I need a table that looks like this.
> sample_years
element year year.1 year.2 year.3
1 1 NA NA NA NA
2 2 NA 1918 NA NA
3 3 NA NA NA NA
4 4 NA NA NA NA
5 5 NA 1912 1913 NA
6 6 NA 1893 1898 1925
7 7 NA 1820 1830 1899
8 8 NA NA NA NA
9 9 NA 1808 1810 1854
10 10 NA NA NA NA
11 11 NA NA NA NA
12 12 NA 1885 NA NA
13 13 NA 1900 NA NA
14 14 NA 1926 1933 NA
15 15 NA NA NA NA
16 16 NA NA NA NA
17 17 NA 1870 NA NA
18 18 NA NA 1923 NA
19 19 NA NA NA NA
20 20 NA NA NA NA
> dput(sample)
list(c("", "2"), c("", "1918"), "", "", c("", "1912", "1913"),
c("", "1893", "1898", "1925", "1993"), c("", "1820", "1830",
"1899", "1900"), "", c("", "1808", "1810", "1854", "1905",
"1907"), "", "", c("", "1885"), c("", "1900"), c("", "1926",
"1933"), "", "", c("", "1870"), c("", "1", "1923"), "", "")
Upvotes: 0
Views: 135
Reputation: 38500
I think sapply
is what you are looking for. For your list named sample:
sapply(sample, function(i) sum(i != ""))
You can then extract the elements of the list that fit your criterion as follows:
myNewSample <- sample[which(sapply(sample, function(i) sum(i != ""))) == 4]
On a sidenote, it is not advisable to use "sample" as the name of your list object as it is a fairly important function in R. See ?sample
.
Upvotes: 1
Reputation: 51582
We can use rbind.fill
from plyr
package to bind the list, and then grepl
to handle your condition,
df <- rbind.fill(lapply(lst1,function(i)as.data.frame(t(i))))
df[!apply(df, 1:2, function(i) grepl('[0-9]{4}', i))] <- NA
head(df)
# V1 V2 V3 V4 V5 V6
#1 <NA> <NA> <NA> <NA> <NA> <NA>
#2 <NA> 1918 <NA> <NA> <NA> <NA>
#3 <NA> <NA> <NA> <NA> <NA> <NA>
#4 <NA> <NA> <NA> <NA> <NA> <NA>
#5 <NA> 1912 1913 <NA> <NA> <NA>
#6 <NA> 1893 1898 1925 1993 <NA>
Upvotes: 2