Combine a mass data export into "R" without having to add cut and paste rows one by one

Question

I have more than 40,000 observations in a dataset with more than 250 variables of companies and various quantities involving meetings, attendees, representatives, the number of representatives, etc.

Using R code I created a new dataset with only four variables and whose descriptive statistics I would like export into Excel that are of interest to me:

Subset.MergedEx.SO <- mergedex1.SO[, c(10, 72, 73, 120, 121 )]

The variable numbers correspond to the following column names

mergedex1.SO <- c("sn", "earntot", "earnctot", "meeting65",      
"meeting55")

"sn" stands for company name, the rest are variables corresponding to various measurements of meetings, their duration, number of presenters, etc.

Afterwards I made a subset of datasets corresponding to each specific company in the 40,000 observation dataset with five variables instead of the original 250.

The code is as follows:

BroomeStreet <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Broome Street'),]

CompanyA <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company A'),]


CompanyB <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company B'),] 

CompanyC <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company C'),]

CompanyBC <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company BC'),]

CompanyCC <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company CC'),]

and so on for more than 45 companies. [Later on I will be creating subsets by company name AND DATE spanning from 1965 to 1987 which is why I am asking this entire question for just this isolated instance where the date is immaterial for the companies involved].

My task is to extract descriptive statistics for each one of the variables following the "sn" column. I am looking for mean, standard deviation, min, max, and number of observations for variable entitled "earntot"; mean, standard deviation, min, max, and number of observations for the variable entitled "earnctot", and the same descriptive statistics for variable "meeting55" and "meeting65".

I was able to accomplish this using the following code along with a specific formula:

EarntotCompanyA <-CompanyA$earntot
EarnctotCompanyA <-CompanyA$earnctot
meet55CompanyA<-CompanyA$meet55
meet65CompanyA <-CompanyA$meet65

CompanyA_ALL_INFORMATION<-cbind(EarntotCompanyA,EarnctotCompanyA,
                            meet55CompanyA,meet65CompanyA)

library(psych)
info<-describe(CompanyA_ALL_INFORMATION)
n<-info[,2]                 # vector of total number
mean<-info[,3]              # vector of mean
sd<-info[,4]                # vector of sd
min<-info[,8]               # vector of min
max<-info[,9]               # vector of max

#this is ordered by the naming function below
value<-round(c(mean,sd,min,max,n),2) 

col.names<-naming(CompanyA_ALL_INFORMATION)



descriptives<-t(as.data.frame(value))
colnames(descriptives)<-col.names
rownames(descriptives)<-"Company A" 

library(xlsx)
write.xlsx(descriptives, "descriptives.CompanyA.xlsx")

Upon completing this, I get a single row in Excel with previously the information I need.

Then I proceed the follow the same exact step as above except using a different company to get yet another separate file such as "descriptive.CompanyB.xlsx", "descriptives.CompanyC.xlsx", ....

I cut and paste all the rows from each one of the more than 50 open excel windows and combine them into another separate excel window containing all the information I want.

An example of a single row looks like this:

average.number.of.EarntotCompanyA   average.number.of.EarnctotCompanyA    
average.number.of.meet55CompanyA    average.number.of.meet65CompanyA    standard.deviation.of.EarntotCompanyA   standard.deviation.of.EarnctotCompanyA  standard.deviation.of.meet55CompanyA    standard.deviation.of.meet65CompanyA    min.number.of.EarntotCompanyA   min.number.of.EarnctotCompanyA  min.number.of.meet55CompanyA    min.number.of.meet65CompanyA    max.number.of.EarntotCompanyA   max.number.of.EarnctotCompanyA  max.number.of.meet55CompanyA    max.number.of.meet65CompanyA    total.number.of.EarntotCompanyA total.number.of.EarnctotCompanyA    total.number.of.meet55CompanyA  total.number.of.meet65CompanyA
Company A   16.58   22.91   1   1.85    15.68   16.81   1.75    2.34    0     
0   0   0   84.11   92.11   5   9   176 176 69  229

How can I have all the rows appear in one single file without getting each one by itself, having to cut and paste it from each single excel file and then having to paste it into a separate file. I have had more than 50 excel files open in the background with the precise information that I need but it is only available one at a time.

Below is a reproducible example of the data:

> dput((head(Subset.MergedEx.SO, 120)))

structure(list(sn = structure(c(2L, 2L, 3L, 5L, 2L, 7L, 1L, 9L, 
1L, 9L, NA, 9L, 1L, 26L, 11L, 9L, 7L, NA, NA, 7L, 17L, 9L, NA, 
21L, 7L, 17L, 7L, 7L, 16L, 7L, 7L, 7L, 7L, 26L, 7L, 6L, 26L, 
22L, NA, NA, 11L, 23L, 23L, 26L, NA, 7L, 23L, 1L, NA, 1L, 7L, 
11L, 12L, 13L, 9L, NA, 15L, NA, 20L, 15L, NA, 17L, 5L, NA, 22L, 
15L, NA, NA, 5L, 8L, 32L, 29L, 23L, 33L, 1L, 23L, 14L, 6L, 7L, 
15L, 15L, 29L, NA, 21L, 6L, 35L, 32L, 32L, 7L, 31L, 23L, 23L, 
1L, 29L, 34L, 34L, 34L, 17L, 24L, 24L, 24L, 24L, 7L, 16L, 7L, 
23L, 23L, 34L, 29L, 15L, NA, 35L, 24L, 27L, 33L, 35L, 10L, 34L, 
33L, 34L), .Label = c("Broome Street", "Company A", "Company B", 
"Company BC", "Company C", "Company CC", "Company D Clinton", 
"Company DD", "Company E", "Company ED BroadCompany", "Company G", 
"Company H BroadCompany", "Company I BroadCompany", "Company I Studio", 
"Company J", "Company K", "Company L", "Company M", "Company M   
BroadCompany", 
"Company M HS BroadCompany", "Company MCC BroadCompany", "Company N", 
"Company P", "Company Q", "Company Q Company N", "Company Q Company ZZ", 
"Company R - Company ZZ", "Company SLab", "Company Z", "Company ZE", 
"Company ZED", "Company ZEQ", "Company ZZ", "Company ZZQ", "Company ZZQ    
Company N"), class = "factor"), earntot = c(21.85, 20.8, NA, 8.16, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, 7.16, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 43.32, NA, 30.48, NA, NA, 34.9, NA, NA, NA, 
NA, NA, 25.82, 40.75, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 
NA, NA, NA, 30, NA, NA, NA, NA, NA, NA, 39.1, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, 52.29, 44.32, NA, 7, 38.32, 0, NA, NA, 
8.25, NA, NA, NA, NA, NA, 51.12, 39.9, NA, 37.48, 32.74, NA, 
NA, NA, 33.4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 30.82, 
NA, NA, NA, NA, NA, 5.74, NA, NA, NA, NA, NA, NA, NA, NA, 44.48, 
NA), earnctot = c(29.43, 20.8, NA, 8.16, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 7.16, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 49.9, NA, 37.56, NA, NA, 41.98, NA, NA, NA, NA, NA, 37.32, 
49, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, 37, NA, 
NA, NA, NA, NA, NA, 47.68, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 57.29, 48.48, NA, 7, 45.9, 0, NA, NA, 15.75, NA, NA, NA, 
NA, NA, 54.12, 46.65, NA, 45.56, 39.9, NA, NA, NA, 39.98, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 38.4, NA, NA, NA, NA, 
NA, 12.9, NA, NA, NA, NA, NA, NA, NA, NA, 52.06, NA), meet55 = c(0L, 
0L, NA, NA, NA, NA, 1L, NA, NA, NA, NA, 5L, NA, 0L, NA, 5L, NA, 
NA, NA, 0L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 0L, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 4L, 0L, NA, 
NA, NA, 4L, 4L, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, 1L, 
NA, NA, NA, NA, 1L, NA, NA, 0L, 4L, 0L, NA, NA, 0L, NA, NA, NA, 
NA, NA, 4L, 3L, 5L, NA, NA, NA, 1L, NA, 0L, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, 5L, NA, NA, NA, NA, NA, 0L, NA, 0L, NA, 
NA, NA, NA, NA, NA, NA, NA), meet65 = c(0L, 0L, 5L, 0L, 6L, NA, 
0L, 5L, NA, 5L, NA, 6L, NA, 0L, 5L, 2L, NA, NA, NA, 0L, 5L, 5L, 
NA, NA, NA, 0L, NA, 1L, 4L, 7L, 5L, 5L, 7L, 0L, 5L, NA, 0L, 1L, 
NA, NA, NA, 2L, 0L, 6L, NA, 8L, 2L, 0L, NA, 4L, 0L, 1L, 3L, NA, 
NA, NA, NA, NA, 4L, 0L, NA, 5L, 7L, NA, 0L, NA, NA, NA, 5L, 0L, 
5L, 4L, 0L, 2L, 0L, 0L, 7L, 0L, NA, 5L, NA, 8L, NA, 0L, 1L, 7L, 
0L, 4L, 7L, 0L, 3L, 0L, NA, NA, 7L, 5L, 8L, 5L, 5L, 6L, 5L, 6L, 
5L, 2L, 0L, 8L, 7L, 7L, 5L, 0L, NA, 0L, 6L, NA, 8L, 8L, 5L, 7L, 
7L, 6L)), .Names = c("sn", "earntot", "earnctot", "meet55", "meet65"
), row.names = c(NA, 120L), class = "data.frame")

lukeA · Accepted Answer

I suggest

# install.packages("dplyr") # uncomment and run if you have to
library(dplyr)
Subset.MergedEx.SO %>% group_by(sn) %>% 
  summarise_each(funs(n(), mean(., na.rm = TRUE), sd(., na.rm = TRUE), min(., na.rm = TRUE), max(., na.rm = TRUE))) %>%
  write.csv2(tf <<- tempfile(fileext = ".csv"))
cat(tf) # open that file in excel

You may have to tweak write.csv2 (i.e. use write.csv or write.table with sep=" ") depending on your Excel/OS configuration.

Combine a mass data export into "R" without having to add cut and paste rows one by one

Answers (1)

Related Questions

Combine a mass data export into &quot;R&quot; without having to add cut and paste rows one by one

Answers (1)

Related Questions

Combine a mass data export into "R" without having to add cut and paste rows one by one