Reputation: 53
I have this dataset which includes all the sales for a company in a given year (company code = gvkey, year = fyeqarq, sales = realsales). After calculating the yearly growth rates for realsales, I try to insert them into the df. For some reason, I've been getting the following error message
"Error in $<-.data.frame
(*tmp*
, growth_rate, value = c(10041
= NA, :
replacement has 204072 rows, data has 204024" when doing so.
I already attempted to remove all NA values and other solutions found in this forum, but unfortunately, none of them worked.
The code fragment which is yielding this error:
rs <- rs[order(rs$gvkey, rs$fyearq, rs$realsales),]
table(is.na(rs$realsales))
rs <- rs %>%
group_by(gvkey) %>%
filter(!any(is.na(realsales))) %>%
ungroup()
rs$growth_rate <- NA
growth_rate <-function(x){
out <- c(NA, x[2:length(x)]/ x[1:(length(x)-1)])
return(out)
}
rs$growth_rate <- do.call("c", by(rs$realsales,rs$gvkey, growth_rate))
It does create a value with all the 204072 elements if I only run
growth_rate <- do.call("c", by(rs$realsales,rs$gvkey, growth_rate))
I don't know if it points to anything but thought it was worth mentioning.
Everything works until it reaches the last line.
Another important thing to point out is, this wasn't happening with the previous dataset. I have changed it a bit to have more observations than the previous one, but it is actually the same, just bigger. Only now I am getting this error. One difference is that I have merged two data frames in order to convert nominal sales to real sales, something I have not done in the previous one. Segment where I do this:
df.gdpdeflator <- read.table("gdpdeflator.txt", header=TRUE)
real_sales <- left_join(sumofsalesbyfirm, df.gdpdeflator, by = "fyearq")
real_sales$realsales <- real_sales$saley/(real_sales$deflator/100)
rs <- aggregate(realsales~gvkey+fyearq, real_sales, sum)
Let me know if further information is required, I'll be happy to provide it.
Upvotes: 0
Views: 231
Reputation: 160687
Use of 2:length(x)
works fine as long as your x
is length 2 or more. I believe your intent for that is to get all but the first, in which case all of these work:
x <- 1:10
x[-1]
x[ seq_len(length(x))[-1] ]
tail(x, n=-1)
# [1] 2 3 4 5 6 7 8 9 10
Let me formalize this a little to show several options (wrong and right) and show some output.
allbutfirst <- function(n) {
sapply(list(
wrong1 = 2:length(n),
wrong2 = n[ 2:length(n) ],
right1 = n[ -1 ],
right2 = n[ seq_len(length(n))[-1] ],
right3 = tail(n, n=-1)
), paste, collapse = ",")
}
allbutlast <- function(m) {
sapply(list(
wrong1 = 1:(length(m)-1),
wrong2 = m[ 1:max(0, length(m)-1) ],
right1 = m[ -length(m) ],
right2 = m[ seq_len(max(0, length(m) - 1)) ],
right3 = head(m, n=-1)
), paste, collapse = ",")
}
allbutfirst(1:5)
# wrong1 wrong2 right1 right2 right3
# "2,3,4,5" "2,3,4,5" "2,3,4,5" "2,3,4,5" "2,3,4,5"
cat(paste(allbutfirst(1:5), collapse = "\n"))
# 2,3,4,5
# 2,3,4,5
# 2,3,4,5
# 2,3,4,5
# 2,3,4,5
cat(paste(allbutfirst(1), collapse = "\n"))
# 2,1
# NA,1
#
#
#
(The wrong
labels are there because they go wrong when the length is not 2 or more ...)
The "2,3,4,5"
means the returned vector is length four, iterating from 2 to 5. The "2,1"
means length two, decrementing from 2 to 1 (when we did not mean to do so). Of course, the NA
is just not right.
The empty rows there are relevant: they mean that there were fewer than 2, and nothing was returned (which is what we want). To call out the empty strings, I'll replace them with ""
, just for show. But they are empty, as they should be.
So this "table" denotes the different methods
allbutfirst(x) allbutlast(x)
x <- 1:5 wrong1 2,3,4,5 1,2,3,4
wrong2 2,3,4,5 1,2,3,4
right1 2,3,4,5 1,2,3,4
right2 2,3,4,5 1,2,3,4
right3 2,3,4,5 1,2,3,4
So far so good, no harm yet.
allbutfirst(x) allbutlast(x)
x <- 1 wrong1 2,1 1,0 <-- length 2, expected none
wrong2 NA,1 1 <-- 2 or 1, expected 0
right1 "" ""
right2 "" ""
right3 "" ""
x <- integer(0) wrong1 2,1,0 1,0,-1 <-- length 3? negative?
wrong2 NA,NA NA <-- all wrong
right1 "" ""
right2 "" ""
right3 "" ""
Moral of the story:
head
and tail
with negative counts works wellx[-1]
and x[-length(x)]
is equivalent, and still works wellseq_len(max(0, ...))
is a safe way of doing things; seq_len(0)
will always be empty, 1:0
will not.Upvotes: 4