Reputation: 65
I am quite new to R. I have unbalanced panel data. BvD_ID_Number is the identification number for each firm, TotalAsset is the value for Total Assets from Balance sheets, in each period of time (Year). Here an overview:
structure(list(BvD_ID_Number = c("FR810911719", "FR810911719",
"GBFC024701", "GBFC024701", "GBFC024701", "GBFC32536", "GBFC32699",
"GBFC32699", "GBFC032748", "GBFC032748"), Year = c(2017, 2016,
2018, 2017, 2016, 2017, 2016, 2015, 2017, 2016), TotalAsset = c(2220,
1174, 124726, 126010, 121837, 72912, 111298, 74457, 6579, 6056
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
I want, for each BvD_ID_Number, to compute the average value of TotalAsset for the entire available time window.
I used this code, but it doesn't work well:
library(dplyr)
df <-
p_TotalAsset1 %>%
group_by(p_TotalAsset1$BvD_ID_Number) %>%
mutate(TotalAsset_Avegage = round(mean(p_TotalAsset1$TotalAsset)))
Thanks for helping me.
Upvotes: 0
Views: 873
Reputation: 7385
You can use summarize
or mutate
:
Using summarize
will fully aggregate your data to provide only the grouping variable (ID number for each firm) and the mean.
df %>%
group_by(BvD_ID_Number) %>%
summarize(TotalAsset_Average = round(mean(TotalAsset),0))
This gives us:
BvD_ID_Number TotalAsset_Average
<chr> <dbl>
1 FR810911719 1697
2 GBFC024701 124191
3 GBFC032748 6318.
4 GBFC32536 72912
5 GBFC32699 92878.
Using mutate
df %>%
group_by(BvD_ID_Number) %>%
mutate(TotalAsset_Average = round(mean(TotalAsset),0))
This gives us:
# A tibble: 10 x 4
# Groups: BvD_ID_Number [5]
BvD_ID_Number Year TotalAsset TotalAsset_Average
<chr> <dbl> <dbl> <dbl>
1 FR810911719 2017 2220 1697
2 FR810911719 2016 1174 1697
3 GBFC024701 2018 124726 124191
4 GBFC024701 2017 126010 124191
5 GBFC024701 2016 121837 124191
6 GBFC32536 2017 72912 72912
7 GBFC32699 2016 111298 92878
8 GBFC32699 2015 74457 92878
9 GBFC032748 2017 6579 6318
10 GBFC032748 2016 6056 6318
Data:
structure(list(BvD_ID_Number = c("FR810911719", "FR810911719",
"GBFC024701", "GBFC024701", "GBFC024701", "GBFC32536", "GBFC32699",
"GBFC32699", "GBFC032748", "GBFC032748"), Year = c(2017, 2016,
2018, 2017, 2016, 2017, 2016, 2015, 2017, 2016), TotalAsset = c(2220,
1174, 124726, 126010, 121837, 72912, 111298, 74457, 6579, 6056
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
Upvotes: 1