Oposum
Oposum

Reputation: 1243

Transforming a .csv file in R

I have a following .csv file that I downloaded online:

financials.morningstar.com/ajax/exportKR2CSV.html?&t=AAPL

I am trying to edit the spreadsheet so that the column names are dates and row names are the various ratios and indices provided in it. Below is my code, but neither column nor row names work. Any suggestions?

read.csv('/path_to_csv.csv',header=F)[-c(1:2),-c(12)]->spreadsheet
spreadsheet_body<-spreadsheet[-1,-1]
colnames(spreadsheet_body)<-spreadsheet[1,-1]
rownames(spreadsheet_body)<-c()
rownames(spreadsheet_body)<-spreadsheet[-1,1]

Upvotes: 0

Views: 84

Answers (3)

Parfait
Parfait

Reputation: 107587

Consider simply transposing the data frame with t() and then replacing the headers from first row. Below uses the AAPL csv file from URL:

df <- read.csv('http://financials.morningstar.com/ajax/exportKR2CSV.html?&t=AAPL', 
           skip=2, stringsAsFactors = FALSE)
df <- df[df[2]!='2006-09',]               # REMOVE REPEAT HEADERS
df <- df[-grep('Key Ratios', df$X),]      # REMOVE KEY RATIO HEADERS

finaldf <- data.frame(t(df), stringsAsFactors = FALSE)
colnames(finaldf) <- finaldf[1,]
finaldf <- finaldf[-1,]

# PERIOD COLUMN
finaldf$Period <- as.character(rownames(finaldf))
finaldf$Period <- gsub("X", "Y", gsub("\\.", "M", finaldf$Period))

rownames(finaldf) <- 1:nrow(finaldf)                      # RESET ROWNAMES
finaldf <- finaldf[,c(ncol(finaldf), 2:ncol(finaldf)-1)]  # RE-ORDER PERIOD TO START

# CONVERT TO NUMERIC
for (i in names(finaldf)) {
        if (i != "Period") {
          finaldf[[i]] <- as.numeric(gsub(",", "", finaldf[[i]]))
        }
     }

finaldf[, c(1:7)]

Output

#      Period Revenue USD Mil Gross Margin % Operating Income USD Mil
# 1  Y2006M09           19315           29.0                     2453
# 2  Y2007M09           24006           34.0                     4409
# 3  Y2008M09           32479           34.3                     6275
# 4  Y2009M09           42905           40.1                    11740
# 5  Y2010M09           65225           39.4                    18385
# 6  Y2011M09          108249           40.5                    33790
# 7  Y2012M09          156508           43.9                    55241
# 8  Y2013M09          170910           37.6                    48999
# 9  Y2014M09          182795           38.6                    52503
# 10 Y2015M09          233715           40.1                    71230
# 11      TTM          227535           39.8                    66864
#    Operating Margin % Net Income USD Mil Earnings Per Share USD
# 1                12.7               1989                   0.32
# 2                18.4               3496                   0.56
# 3                19.3               4834                   0.77
# 4                27.4               8235                   1.30
# 5                28.2              14013                   2.16
# 6                31.2              25922                   3.95
# 7                35.3              41733                   6.31
# 8                28.7              37037                   5.68
# 9                28.7              39510                   6.45
# 10               30.5              53394                   9.22
# 11               29.4              50678                   8.97

Upvotes: 2

Andrew Lavers
Andrew Lavers

Reputation: 4378

You don't give the specific message. When I run your code I get

duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘’, ‘10-Year Average’, ‘3-Year  Average’, ‘

I suspect this occurs because the name 3-Year Average etc. occur in many sections.

Here is a possible approach (for only three of the sections) that is tedious, but may make subsequent processing easier .

financials <- read.table('Downloads/AAPL Key Ratios.csv',sep=",",quote = '"', skip=3, nrow=15,
                         col.names=c('item', 'Y2006M09',    'Y2007M09',     'Y2008M09',     'Y2009M09', 'Y2010M09', 
                                     'Y2011M09',    'Y2012M09', 'Y2013M09', 'Y2014M09', 'Y2015M09', 'TTM'))
margins    <- read.table('Downloads/AAPL Key Ratios.csv',sep=",",quote = '"', skip=21, nrow=9,  
                         col.names=c('item', 'Y2006M09',    'Y2007M09',     'Y2008M09',     'Y2009M09', 'Y2010M09', 
                                     'Y2011M09',    'Y2012M09', 'Y2013M09', 'Y2014M09', 'Y2015M09', 'TTM'))
revenue_percent <- read.table('Downloads/AAPL Key Ratios.csv',sep=",",quote = '"', skip=44, nrow=4,  
                         col.names=c('item', 'Y2006M09',    'Y2007M09',     'Y2008M09',     'Y2009M09', 'Y2010M09', 
                                     'Y2011M09',    'Y2012M09', 'Y2013M09', 'Y2014M09', 'Y2015M09', 'TTM'))

Upvotes: 2

Robert
Robert

Reputation: 5152

This does not solve all the questions, but you could try the rownames by parts:

read.csv("http://financials.morningstar.com/ajax/exportKR2CSV.html?&t=AAPL",header=T,stringsAsFactors = F,skip = 2)[,-c(12)]->spreadsheet
#str(spreadsheet)
View(spreadsheet)

Upvotes: 2

Related Questions