user113156
user113156

Reputation: 7147

Reading in csv files with commas in the text

I am trying to read in the following csv file into R

file

The problem I am running into is that Property, plant and equipment, Gross property, pland and equipment and Net property, plant and equipment all have commas in their names.

How is it possible to overlook these 3 rows or ignore commas in the text fields?

This is how the data looks in excel using the text to columns function.

excel screen

The columns have been split for two of the rows and the data shifted across to 2009, which should be 2010.

EDIT: Here is the notepad data.

FACEBOOK INC A  (FB) CashFlowFlag BALANCE SHEET
Fiscal year ends in December. USD in millions except per share data.,2017-12,2016-12,2015-12,2014-12,2013-12,2012-12,2011-12,2010-12,2009-12,2008-12
Assets
Current assets
Cash
Cash and cash equivalents,8079,8903,4907,4315,3323,2384,1512,1785,,
Short-term investments,33632,20546,13527,6884,8126,7242,2396,,,
Total cash,41711,29449,18434,11199,11449,9626,3908,1785,,
Receivables,5832,3993,2559,1678,1109,719,547,373,,
Prepaid expenses,1020,959,659,793,461,471,149,88,,
Other current assets,,,,,51,451,,,,
Total current assets,48563,34401,21652,13670,13070,11267,4604,2246,,
Non-current assets
"Property, plant and equipment"
"Gross property, plant and equipment",18337,11803,7819,5784,4142,3273,1925,820,,
Accumulated Depreciation,-4616,-3212,-2132,-1817,-1260,-882,-450,-246,,
"Net property, plant and equipment",13721,8591,5687,3967,2882,2391,1475,574,,
Goodwill,18221,18122,18026,17981,839,587,82,37,,
Intangible assets,1884,2535,3246,3929,883,801,80,59,,
Other long-term assets,2135,1312,796,637,221,57,90,74,,
Total non-current assets,35961,30560,27755,26514,4825,3836,1727,744,,
Total assets,84524,64961,49407,40184,17895,15103,6331,2990,,
Liabilities and stockholders' equity
Liabilities
Current liabilities
Short-term debt,,,201,,,,,,,
Capital leases,,,7,114,239,365,279,106,,
Accounts payable,380,302,196,176,87,65,63,29,,
Taxes payable,230,,,,,,,,,
Accrued liabilities,685,331,192,164,87,,296,137,,
Deferred revenues,98,90,56,66,38,30,90,42,,
Other current liabilities,2367,2152,1273,904,649,592,171,75,,
Total current liabilities,3760,2875,1925,1424,1100,1052,899,389,,
Non-current liabilities
Long-term debt,,,,,,1500,,250,,
Capital leases,,,107,119,237,491,398,117,,
Deferred taxes liabilities,,,163,987,,,,,,
Other long-term liabilities,6417,2892,2994,1558,1088,305,135,72,,
Total non-current liabilities,6417,2892,3264,2664,1325,2296,533,439,,
Total liabilities,10177,5767,5189,4088,2425,3348,1432,828,,
Stockholders' equity
Preferred stock,,,,,,,615,615,,
Additional paid-in capital,40584,38227,34886,30225,12297,10094,2684,947,,
Retained earnings,33990,21670,9787,6099,3159,1659,1606,606,,
Accumulated other comprehensive income,-227,-703,-455,-228,14,2,-6,-6,,
Total stockholders' equity,74347,59194,44218,36096,15470,11755,4899,2162,,
Total liabilities and stockholders' equity,84524,64961,49407,40184,17895,15103,6331,2990,,

Upvotes: 0

Views: 90

Answers (2)

Carlos Eduardo Lagosta
Carlos Eduardo Lagosta

Reputation: 1001

Although some names have commas, they are enclosed in quotation marks, so they will be read as a block. Just read the file without header and skip the first line:

fbBalance <- read.csv('fb_balance_sheet.csv', header = FALSE, skip = 1)

> fbBalance[12:16,]
                                    V1    V2    V3    V4    V5    V6   V7   V8   V9 V10 V11
12                  Non-current assets                                                     
13       Property, plant and equipment                                                     
14 Gross property, plant and equipment 18337 11803  7819  5784  4142 3273 1925  820        
15            Accumulated Depreciation -4616 -3212 -2132 -1817 -1260 -882 -450 -246        
16   Net property, plant and equipment 13721  8591  5687  3967  2882 2391 1475  574      

You can then adjust your data. I recommend first transpose, so you have the variables as columns:

fbT <- as.data.frame(t(fbBalance[2:ncol(fbBalance)]))

names(fbT) <- fbBalance[,1]

names(fbT)[1] <- 'Fiscal year'

fbT[,1] <- gsub('-12$', '', fbT[,1]) # keep only the year

fbT <- apply(fbT, 2, as.numeric)  # convert all columns to numeric

fbT <- fbT[, colSums(is.na(fbT)) != nrow(fbT)] # remove columns that have only NAs

> fbT[,1:4]
      Fiscal year Cash and cash equivalents Short-term investments Total cash
[1,]         2017                      8079                  33632      41711
[2,]         2016                      8903                  20546      29449
[3,]         2015                      4907                  13527      18434
[4,]         2014                      4315                   6884      11199
[5,]         2013                      3323                   8126      11449
[6,]         2012                      2384                   7242       9626
[7,]         2011                      1512                   2396       3908
[8,]         2010                      1785                     NA       1785
[9,]         2009                        NA                     NA         NA
[10,]        2008                        NA                     NA         NA

Upvotes: 1

Ralf Stubner
Ralf Stubner

Reputation: 26843

All strings with commas in then are properly quoted with ", so there is no difficulty with reading them in using read.csv. Assuming the file is stored as ~/Downloads/fb_balance_sheet.csv one gets:

data <- read.csv("~/Downloads/fb_balance_sheet.csv", header = TRUE, skip = 1, stringsAsFactors = FALSE)
dim(data)
#> [1] 45 11
data[ ,1]
#>  [1] "Assets"                                    
#>  [2] "Current assets"                            
#>  [3] "Cash"                                      
#>  [4] "Cash and cash equivalents"                 
#>  [5] "Short-term investments"                    
#>  [6] "Total cash"                                
#>  [7] "Receivables"                               
#>  [8] "Prepaid expenses"                          
#>  [9] "Other current assets"                      
#> [10] "Total current assets"                      
#> [11] "Non-current assets"                        
#> [12] "Property, plant and equipment"             
#> [13] "Gross property, plant and equipment"       
#> [14] "Accumulated Depreciation"                  
#> [15] "Net property, plant and equipment"         
#> [16] "Goodwill"                                  
#> [17] "Intangible assets"                         
#> [18] "Other long-term assets"                    
#> [19] "Total non-current assets"                  
#> [20] "Total assets"                              
#> [21] "Liabilities and stockholders' equity"      
#> [22] "Liabilities"                               
#> [23] "Current liabilities"                       
#> [24] "Short-term debt"                           
#> [25] "Capital leases"                            
#> [26] "Accounts payable"                          
#> [27] "Taxes payable"                             
#> [28] "Accrued liabilities"                       
#> [29] "Deferred revenues"                         
#> [30] "Other current liabilities"                 
#> [31] "Total current liabilities"                 
#> [32] "Non-current liabilities"                   
#> [33] "Long-term debt"                            
#> [34] "Capital leases"                            
#> [35] "Deferred taxes liabilities"                
#> [36] "Other long-term liabilities"               
#> [37] "Total non-current liabilities"             
#> [38] "Total liabilities"                         
#> [39] "Stockholders' equity"                      
#> [40] "Preferred stock"                           
#> [41] "Additional paid-in capital"                
#> [42] "Retained earnings"                         
#> [43] "Accumulated other comprehensive income"    
#> [44] "Total stockholders' equity"                
#> [45] "Total liabilities and stockholders' equity"

Created on 2018-09-17 by the reprex package (v0.2.0).

Upvotes: 2

Related Questions