Reputation: 107
I have more than 40,000 observations in a dataset with more than 250 variables of companies and various quantities involving meetings, attendees, representatives, the number of representatives, etc.
Using R code I created a new dataset with only four variables and whose descriptive statistics I would like export into Excel that are of interest to me:
Subset.MergedEx.SO <- mergedex1.SO[, c(10, 72, 73, 120, 121 )]
The variable numbers correspond to the following column names
mergedex1.SO <- c("sn", "earntot", "earnctot", "meeting65",
"meeting55")
"sn" stands for company name, the rest are variables corresponding to various measurements of meetings, their duration, number of presenters, etc.
Afterwards I made a subset of datasets corresponding to each specific company in the 40,000 observation dataset with five variables instead of the original 250.
The code is as follows:
BroomeStreet <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Broome Street'),]
CompanyA <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company A'),]
CompanyB <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company B'),]
CompanyC <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company C'),]
CompanyBC <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company BC'),]
CompanyCC <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company CC'),]
and so on for more than 45 companies. [Later on I will be creating subsets by company name AND DATE spanning from 1965 to 1987 which is why I am asking this entire question for just this isolated instance where the date is immaterial for the companies involved].
My task is to extract descriptive statistics for each one of the variables following the "sn" column. I am looking for mean, standard deviation, min, max, and number of observations for variable entitled "earntot"; mean, standard deviation, min, max, and number of observations for the variable entitled "earnctot", and the same descriptive statistics for variable "meeting55" and "meeting65".
I was able to accomplish this using the following code along with a specific formula:
EarntotCompanyA <-CompanyA$earntot
EarnctotCompanyA <-CompanyA$earnctot
meet55CompanyA<-CompanyA$meet55
meet65CompanyA <-CompanyA$meet65
CompanyA_ALL_INFORMATION<-cbind(EarntotCompanyA,EarnctotCompanyA,
meet55CompanyA,meet65CompanyA)
library(psych)
info<-describe(CompanyA_ALL_INFORMATION)
n<-info[,2] # vector of total number
mean<-info[,3] # vector of mean
sd<-info[,4] # vector of sd
min<-info[,8] # vector of min
max<-info[,9] # vector of max
#this is ordered by the naming function below
value<-round(c(mean,sd,min,max,n),2)
col.names<-naming(CompanyA_ALL_INFORMATION)
descriptives<-t(as.data.frame(value))
colnames(descriptives)<-col.names
rownames(descriptives)<-"Company A"
library(xlsx)
write.xlsx(descriptives, "descriptives.CompanyA.xlsx")
Upon completing this, I get a single row in Excel with previously the information I need.
Then I proceed the follow the same exact step as above except using a different company to get yet another separate file such as "descriptive.CompanyB.xlsx", "descriptives.CompanyC.xlsx", ....
I cut and paste all the rows from each one of the more than 50 open excel windows and combine them into another separate excel window containing all the information I want.
An example of a single row looks like this:
average.number.of.EarntotCompanyA average.number.of.EarnctotCompanyA
average.number.of.meet55CompanyA average.number.of.meet65CompanyA standard.deviation.of.EarntotCompanyA standard.deviation.of.EarnctotCompanyA standard.deviation.of.meet55CompanyA standard.deviation.of.meet65CompanyA min.number.of.EarntotCompanyA min.number.of.EarnctotCompanyA min.number.of.meet55CompanyA min.number.of.meet65CompanyA max.number.of.EarntotCompanyA max.number.of.EarnctotCompanyA max.number.of.meet55CompanyA max.number.of.meet65CompanyA total.number.of.EarntotCompanyA total.number.of.EarnctotCompanyA total.number.of.meet55CompanyA total.number.of.meet65CompanyA
Company A 16.58 22.91 1 1.85 15.68 16.81 1.75 2.34 0
0 0 0 84.11 92.11 5 9 176 176 69 229
How can I have all the rows appear in one single file without getting each one by itself, having to cut and paste it from each single excel file and then having to paste it into a separate file. I have had more than 50 excel files open in the background with the precise information that I need but it is only available one at a time.
Below is a reproducible example of the data:
> dput((head(Subset.MergedEx.SO, 120)))
structure(list(sn = structure(c(2L, 2L, 3L, 5L, 2L, 7L, 1L, 9L,
1L, 9L, NA, 9L, 1L, 26L, 11L, 9L, 7L, NA, NA, 7L, 17L, 9L, NA,
21L, 7L, 17L, 7L, 7L, 16L, 7L, 7L, 7L, 7L, 26L, 7L, 6L, 26L,
22L, NA, NA, 11L, 23L, 23L, 26L, NA, 7L, 23L, 1L, NA, 1L, 7L,
11L, 12L, 13L, 9L, NA, 15L, NA, 20L, 15L, NA, 17L, 5L, NA, 22L,
15L, NA, NA, 5L, 8L, 32L, 29L, 23L, 33L, 1L, 23L, 14L, 6L, 7L,
15L, 15L, 29L, NA, 21L, 6L, 35L, 32L, 32L, 7L, 31L, 23L, 23L,
1L, 29L, 34L, 34L, 34L, 17L, 24L, 24L, 24L, 24L, 7L, 16L, 7L,
23L, 23L, 34L, 29L, 15L, NA, 35L, 24L, 27L, 33L, 35L, 10L, 34L,
33L, 34L), .Label = c("Broome Street", "Company A", "Company B",
"Company BC", "Company C", "Company CC", "Company D Clinton",
"Company DD", "Company E", "Company ED BroadCompany", "Company G",
"Company H BroadCompany", "Company I BroadCompany", "Company I Studio",
"Company J", "Company K", "Company L", "Company M", "Company M
BroadCompany",
"Company M HS BroadCompany", "Company MCC BroadCompany", "Company N",
"Company P", "Company Q", "Company Q Company N", "Company Q Company ZZ",
"Company R - Company ZZ", "Company SLab", "Company Z", "Company ZE",
"Company ZED", "Company ZEQ", "Company ZZ", "Company ZZQ", "Company ZZQ
Company N"), class = "factor"), earntot = c(21.85, 20.8, NA, 8.16, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 7.16, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 43.32, NA, 30.48, NA, NA, 34.9, NA, NA, NA,
NA, NA, 25.82, 40.75, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0,
NA, NA, NA, 30, NA, NA, NA, NA, NA, NA, 39.1, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 52.29, 44.32, NA, 7, 38.32, 0, NA, NA,
8.25, NA, NA, NA, NA, NA, 51.12, 39.9, NA, 37.48, 32.74, NA,
NA, NA, 33.4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 30.82,
NA, NA, NA, NA, NA, 5.74, NA, NA, NA, NA, NA, NA, NA, NA, 44.48,
NA), earnctot = c(29.43, 20.8, NA, 8.16, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 7.16, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 49.9, NA, 37.56, NA, NA, 41.98, NA, NA, NA, NA, NA, 37.32,
49, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, 37, NA,
NA, NA, NA, NA, NA, 47.68, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 57.29, 48.48, NA, 7, 45.9, 0, NA, NA, 15.75, NA, NA, NA,
NA, NA, 54.12, 46.65, NA, 45.56, 39.9, NA, NA, NA, 39.98, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 38.4, NA, NA, NA, NA,
NA, 12.9, NA, NA, NA, NA, NA, NA, NA, NA, 52.06, NA), meet55 = c(0L,
0L, NA, NA, NA, NA, 1L, NA, NA, NA, NA, 5L, NA, 0L, NA, 5L, NA,
NA, NA, 0L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 0L, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 4L, 0L, NA,
NA, NA, 4L, 4L, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, 1L,
NA, NA, NA, NA, 1L, NA, NA, 0L, 4L, 0L, NA, NA, 0L, NA, NA, NA,
NA, NA, 4L, 3L, 5L, NA, NA, NA, 1L, NA, 0L, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 5L, NA, NA, NA, NA, NA, 0L, NA, 0L, NA,
NA, NA, NA, NA, NA, NA, NA), meet65 = c(0L, 0L, 5L, 0L, 6L, NA,
0L, 5L, NA, 5L, NA, 6L, NA, 0L, 5L, 2L, NA, NA, NA, 0L, 5L, 5L,
NA, NA, NA, 0L, NA, 1L, 4L, 7L, 5L, 5L, 7L, 0L, 5L, NA, 0L, 1L,
NA, NA, NA, 2L, 0L, 6L, NA, 8L, 2L, 0L, NA, 4L, 0L, 1L, 3L, NA,
NA, NA, NA, NA, 4L, 0L, NA, 5L, 7L, NA, 0L, NA, NA, NA, 5L, 0L,
5L, 4L, 0L, 2L, 0L, 0L, 7L, 0L, NA, 5L, NA, 8L, NA, 0L, 1L, 7L,
0L, 4L, 7L, 0L, 3L, 0L, NA, NA, 7L, 5L, 8L, 5L, 5L, 6L, 5L, 6L,
5L, 2L, 0L, 8L, 7L, 7L, 5L, 0L, NA, 0L, 6L, NA, 8L, 8L, 5L, 7L,
7L, 6L)), .Names = c("sn", "earntot", "earnctot", "meet55", "meet65"
), row.names = c(NA, 120L), class = "data.frame")
Upvotes: 0
Views: 92
Reputation: 54237
I suggest
# install.packages("dplyr") # uncomment and run if you have to
library(dplyr)
Subset.MergedEx.SO %>% group_by(sn) %>%
summarise_each(funs(n(), mean(., na.rm = TRUE), sd(., na.rm = TRUE), min(., na.rm = TRUE), max(., na.rm = TRUE))) %>%
write.csv2(tf <<- tempfile(fileext = ".csv"))
cat(tf) # open that file in excel
You may have to tweak write.csv2
(i.e. use write.csv
or write.table
with sep="\t"
) depending on your Excel/OS configuration.
Upvotes: 1