Reputation: 1243
I have a following .csv
file that I downloaded online:
financials.morningstar.com/ajax/exportKR2CSV.html?&t=AAPL
I am trying to edit the spreadsheet so that the column names are dates and row names are the various ratios and indices provided in it. Below is my code, but neither column nor row names work. Any suggestions?
read.csv('/path_to_csv.csv',header=F)[-c(1:2),-c(12)]->spreadsheet
spreadsheet_body<-spreadsheet[-1,-1]
colnames(spreadsheet_body)<-spreadsheet[1,-1]
rownames(spreadsheet_body)<-c()
rownames(spreadsheet_body)<-spreadsheet[-1,1]
Upvotes: 0
Views: 84
Reputation: 107587
Consider simply transposing the data frame with t() and then replacing the headers from first row. Below uses the AAPL csv file from URL:
df <- read.csv('http://financials.morningstar.com/ajax/exportKR2CSV.html?&t=AAPL',
skip=2, stringsAsFactors = FALSE)
df <- df[df[2]!='2006-09',] # REMOVE REPEAT HEADERS
df <- df[-grep('Key Ratios', df$X),] # REMOVE KEY RATIO HEADERS
finaldf <- data.frame(t(df), stringsAsFactors = FALSE)
colnames(finaldf) <- finaldf[1,]
finaldf <- finaldf[-1,]
# PERIOD COLUMN
finaldf$Period <- as.character(rownames(finaldf))
finaldf$Period <- gsub("X", "Y", gsub("\\.", "M", finaldf$Period))
rownames(finaldf) <- 1:nrow(finaldf) # RESET ROWNAMES
finaldf <- finaldf[,c(ncol(finaldf), 2:ncol(finaldf)-1)] # RE-ORDER PERIOD TO START
# CONVERT TO NUMERIC
for (i in names(finaldf)) {
if (i != "Period") {
finaldf[[i]] <- as.numeric(gsub(",", "", finaldf[[i]]))
}
}
finaldf[, c(1:7)]
Output
# Period Revenue USD Mil Gross Margin % Operating Income USD Mil
# 1 Y2006M09 19315 29.0 2453
# 2 Y2007M09 24006 34.0 4409
# 3 Y2008M09 32479 34.3 6275
# 4 Y2009M09 42905 40.1 11740
# 5 Y2010M09 65225 39.4 18385
# 6 Y2011M09 108249 40.5 33790
# 7 Y2012M09 156508 43.9 55241
# 8 Y2013M09 170910 37.6 48999
# 9 Y2014M09 182795 38.6 52503
# 10 Y2015M09 233715 40.1 71230
# 11 TTM 227535 39.8 66864
# Operating Margin % Net Income USD Mil Earnings Per Share USD
# 1 12.7 1989 0.32
# 2 18.4 3496 0.56
# 3 19.3 4834 0.77
# 4 27.4 8235 1.30
# 5 28.2 14013 2.16
# 6 31.2 25922 3.95
# 7 35.3 41733 6.31
# 8 28.7 37037 5.68
# 9 28.7 39510 6.45
# 10 30.5 53394 9.22
# 11 29.4 50678 8.97
Upvotes: 2
Reputation: 4378
You don't give the specific message. When I run your code I get
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘’, ‘10-Year Average’, ‘3-Year Average’, ‘
I suspect this occurs because the name 3-Year Average etc. occur in many sections.
Here is a possible approach (for only three of the sections) that is tedious, but may make subsequent processing easier .
financials <- read.table('Downloads/AAPL Key Ratios.csv',sep=",",quote = '"', skip=3, nrow=15,
col.names=c('item', 'Y2006M09', 'Y2007M09', 'Y2008M09', 'Y2009M09', 'Y2010M09',
'Y2011M09', 'Y2012M09', 'Y2013M09', 'Y2014M09', 'Y2015M09', 'TTM'))
margins <- read.table('Downloads/AAPL Key Ratios.csv',sep=",",quote = '"', skip=21, nrow=9,
col.names=c('item', 'Y2006M09', 'Y2007M09', 'Y2008M09', 'Y2009M09', 'Y2010M09',
'Y2011M09', 'Y2012M09', 'Y2013M09', 'Y2014M09', 'Y2015M09', 'TTM'))
revenue_percent <- read.table('Downloads/AAPL Key Ratios.csv',sep=",",quote = '"', skip=44, nrow=4,
col.names=c('item', 'Y2006M09', 'Y2007M09', 'Y2008M09', 'Y2009M09', 'Y2010M09',
'Y2011M09', 'Y2012M09', 'Y2013M09', 'Y2014M09', 'Y2015M09', 'TTM'))
Upvotes: 2
Reputation: 5152
This does not solve all the questions, but you could try the rownames
by parts:
read.csv("http://financials.morningstar.com/ajax/exportKR2CSV.html?&t=AAPL",header=T,stringsAsFactors = F,skip = 2)[,-c(12)]->spreadsheet
#str(spreadsheet)
View(spreadsheet)
Upvotes: 2