Reputation: 861
My question has two parts.
First, lets say I have long vectors
like years
, cities
, and temp
below, and
I want to create a data table that includes
combinations of all the three vectors.
For simplicity I create a short example here,
by cheating and using cbind
:
years <- rep(c(1950:1951), 2)
cities <- c(rep("LA", 2), rep("NYC", 2))
temp <- c(10, 20, NA, 40)
dat_tb <- cbind(years, cities, temp) %>% data.table()
dat_tb
years cities temp
1: 1950 LA 10
2: 1951 LA 20
3: 1950 NYC <NA>
4: 1951 NYC 40
My question is that if the vectors are large,
I have to compute number of repetitions I had created above,
to enable me to use cbind
.
How can I create a data table like dat_tb
above
just using the vectors below, without using cbind
years = c(1950, 1951)
cities = c("LA", "NYC")
temp = (10, 20, NA, 40)
The second part of my question is that lets say I accidentally dropped the NA values like so
dat_tb <- na.omit(dat_tb)
Now, I do not know where the NA was, so, I use the answer of the question in the first part, to initiate a data table that looks like:
years cities temp
1: 1950 LA 666
2: 1951 LA 666
3: 1950 NYC 666
4: 1951 NYC 666
How can I convert the 666's using the data tale I have, after dropping the NA's:
years cities temp
1: 1950 LA 10
2: 1951 LA 20
3: 1951 NYC 40
to go back to the data table with missing value
years cities temp
1: 1950 LA 10
2: 1951 LA 20
3: 1950 NYC 666
4: 1951 NYC 40
Upvotes: 2
Views: 116
Reputation: 11255
One can directly make a table with data.table
:
library(data.table)
years <- rep(c(1950:1951), 2)
cities <- c(rep("LA", 2), rep("NYC", 2))
temp <- c(10, 20, NA, 40)
data.table(years, cities, temp)
years cities temp
1: 1950 LA 10
2: 1951 LA 20
3: 1950 NYC NA
4: 1951 NYC 40
It also works when you have unequal vector length. Shorter vectors are repeated until they equal the length of the longest one. This means that it's not an exact match because of how it repeats (see NYC has the NA in the first example while LA does here):
years = c(1950, 1951)
cities = c("LA", "NYC")
temp = (10, 20, NA, 40)
data.table(years, cities, temp)
years cities temp
1: 1950 LA 10
2: 1951 NYC 20
3: 1950 LA NA
4: 1951 NYC 40
For your second question, I don't follow, but you determine missing rows with an anti-join.
dat_tb2 <- na.omit(copy(dat_tb))
dat_tb[!dat_tb2, on = .(years,cities)]
years cities temp
1: 1950 NYC <NA>
union(dat_tb2, dat_tb[!dat_tb2, on = .(years,cities)])
years cities temp
1: 1950 LA 10
2: 1951 LA 20
3: 1951 NYC 40
4: 1950 NYC <NA>
Of course, you lose the row order but then again you should be careful before randomly omitting data :)
Upvotes: 1