Reputation: 71
I don't know is question is described properly, but I'd like to check if every group has same values in column file_month
and if no, then insert missing value by using next available row. So for example my data looks like this:
file_month CountryName ProductName target_fe1 target_fe2
<int> <chr> <chr> <dbl> <dbl>
1 7 Germany Inflation 7.52 4.84
2 9 Germany Inflation 7.92 4.74
3 7 USA Inflation 8.02 3.72
4 8 USA Inflation 8.02 3.72
5 9 USA Inflation 8.08 3.92
It is grouped by CountryName and ProductName. Rows, where CountryName is "Germany", are missing file_month = 8, it only has file month = c(7,9), while every country should have file month = c(7,8,9).
I would like to fill this missing row in the way that it uses values from next row:
file_month CountryName ProductName target_fe1 target_fe2
<int> <chr> <chr> <dbl> <dbl>
1 7 Germany Inflation 7.52 4.84
3 **8 Germany Inflation 7.92 4.74**
4 9 Germany Inflation 7.92 4.74
5 7 USA Inflation 8.02 3.72
6 8 USA Inflation 8.02 3.72
7 9 USA Inflation 8.08 3.92
for reproducability:
df <- structure(list(file_month = c(7L, 9L, 7L, 8L, 9L),
CountryName = c("Germany", "Germany", "USA", "USA", "USA"),
ProductName = c("Inflation", "Inflation", "Inflation", "Inflation", "Inflation"),
target_fe1 = c(7.52460071395529,7.92460071395529, 8.0153618755289, 8.0153618755289, 8.08264211789884),
target_fe2 = c(4.8379739368994, 4.7379739368994, 3.71980735749082, 3.71980735749082, 3.92265497264457)),
row.names = c(NA, -5L), class = "data.frame")
How can I do it, preferably using data.table?
Upvotes: 0
Views: 38
Reputation: 850
Unfortunately I do now know data table. You can do something similar with base R and tidyr
df <- structure(list(file_month = c(7L, 9L, 7L, 8L, 9L),
CountryName = c("Germany", "Germany", "USA", "USA", "USA"),
ProductName = c("Inflation", "Inflation", "Inflation", "Inflation", "Inflation"),
target_fe1 = c(7.52460071395529,7.92460071395529, 8.0153618755289, 8.0153618755289, 8.08264211789884),
target_fe2 = c(4.8379739368994, 4.7379739368994, 3.71980735749082, 3.71980735749082, 3.92265497264457)),
row.names = c(NA, -5L), class = "data.frame")
missing<-data.frame(table(df$file_month,df$CountryName))
missing<-missing[missing$Freq==0,]
new_rows<-data.frame(file_month=missing$Var1,CountryName=missing$Var2,ProductName="Inflation",target_fe1=NA,target_fe2=NA)
df<-rbind(df,new_rows)
df<-df[order(df$CountryName,df$file_month),]
tidyr::fill(df,target_fe1,target_fe2,.direction=c("updown"))
The output is
> tidyr::fill(df,target_fe1,target_fe2,.direction=c("updown"))
file_month CountryName ProductName target_fe1 target_fe2
1 7 Germany Inflation 7.524601 4.837974
6 8 Germany Inflation 7.924601 4.737974
2 9 Germany Inflation 7.924601 4.737974
3 7 USA Inflation 8.015362 3.719807
4 8 USA Inflation 8.015362 3.719807
5 9 USA Inflation 8.082642 3.922655
Upvotes: 0