Reputation: 7147
I am trying to read in the following csv file into R
The problem I am running into is that Property, plant and equipment
, Gross property, pland and equipment
and Net property, plant and equipment
all have commas in their names.
How is it possible to overlook these 3 rows or ignore commas in the text fields?
This is how the data looks in excel using the text to columns
function.
The columns have been split for two of the rows and the data shifted across to 2009, which should be 2010.
EDIT: Here is the notepad data.
FACEBOOK INC A (FB) CashFlowFlag BALANCE SHEET
Fiscal year ends in December. USD in millions except per share data.,2017-12,2016-12,2015-12,2014-12,2013-12,2012-12,2011-12,2010-12,2009-12,2008-12
Assets
Current assets
Cash
Cash and cash equivalents,8079,8903,4907,4315,3323,2384,1512,1785,,
Short-term investments,33632,20546,13527,6884,8126,7242,2396,,,
Total cash,41711,29449,18434,11199,11449,9626,3908,1785,,
Receivables,5832,3993,2559,1678,1109,719,547,373,,
Prepaid expenses,1020,959,659,793,461,471,149,88,,
Other current assets,,,,,51,451,,,,
Total current assets,48563,34401,21652,13670,13070,11267,4604,2246,,
Non-current assets
"Property, plant and equipment"
"Gross property, plant and equipment",18337,11803,7819,5784,4142,3273,1925,820,,
Accumulated Depreciation,-4616,-3212,-2132,-1817,-1260,-882,-450,-246,,
"Net property, plant and equipment",13721,8591,5687,3967,2882,2391,1475,574,,
Goodwill,18221,18122,18026,17981,839,587,82,37,,
Intangible assets,1884,2535,3246,3929,883,801,80,59,,
Other long-term assets,2135,1312,796,637,221,57,90,74,,
Total non-current assets,35961,30560,27755,26514,4825,3836,1727,744,,
Total assets,84524,64961,49407,40184,17895,15103,6331,2990,,
Liabilities and stockholders' equity
Liabilities
Current liabilities
Short-term debt,,,201,,,,,,,
Capital leases,,,7,114,239,365,279,106,,
Accounts payable,380,302,196,176,87,65,63,29,,
Taxes payable,230,,,,,,,,,
Accrued liabilities,685,331,192,164,87,,296,137,,
Deferred revenues,98,90,56,66,38,30,90,42,,
Other current liabilities,2367,2152,1273,904,649,592,171,75,,
Total current liabilities,3760,2875,1925,1424,1100,1052,899,389,,
Non-current liabilities
Long-term debt,,,,,,1500,,250,,
Capital leases,,,107,119,237,491,398,117,,
Deferred taxes liabilities,,,163,987,,,,,,
Other long-term liabilities,6417,2892,2994,1558,1088,305,135,72,,
Total non-current liabilities,6417,2892,3264,2664,1325,2296,533,439,,
Total liabilities,10177,5767,5189,4088,2425,3348,1432,828,,
Stockholders' equity
Preferred stock,,,,,,,615,615,,
Additional paid-in capital,40584,38227,34886,30225,12297,10094,2684,947,,
Retained earnings,33990,21670,9787,6099,3159,1659,1606,606,,
Accumulated other comprehensive income,-227,-703,-455,-228,14,2,-6,-6,,
Total stockholders' equity,74347,59194,44218,36096,15470,11755,4899,2162,,
Total liabilities and stockholders' equity,84524,64961,49407,40184,17895,15103,6331,2990,,
Upvotes: 0
Views: 90
Reputation: 1001
Although some names have commas, they are enclosed in quotation marks, so they will be read as a block. Just read the file without header and skip the first line:
fbBalance <- read.csv('fb_balance_sheet.csv', header = FALSE, skip = 1)
> fbBalance[12:16,]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
12 Non-current assets
13 Property, plant and equipment
14 Gross property, plant and equipment 18337 11803 7819 5784 4142 3273 1925 820
15 Accumulated Depreciation -4616 -3212 -2132 -1817 -1260 -882 -450 -246
16 Net property, plant and equipment 13721 8591 5687 3967 2882 2391 1475 574
You can then adjust your data. I recommend first transpose, so you have the variables as columns:
fbT <- as.data.frame(t(fbBalance[2:ncol(fbBalance)]))
names(fbT) <- fbBalance[,1]
names(fbT)[1] <- 'Fiscal year'
fbT[,1] <- gsub('-12$', '', fbT[,1]) # keep only the year
fbT <- apply(fbT, 2, as.numeric) # convert all columns to numeric
fbT <- fbT[, colSums(is.na(fbT)) != nrow(fbT)] # remove columns that have only NAs
> fbT[,1:4]
Fiscal year Cash and cash equivalents Short-term investments Total cash
[1,] 2017 8079 33632 41711
[2,] 2016 8903 20546 29449
[3,] 2015 4907 13527 18434
[4,] 2014 4315 6884 11199
[5,] 2013 3323 8126 11449
[6,] 2012 2384 7242 9626
[7,] 2011 1512 2396 3908
[8,] 2010 1785 NA 1785
[9,] 2009 NA NA NA
[10,] 2008 NA NA NA
Upvotes: 1
Reputation: 26843
All strings with commas in then are properly quoted with "
, so there is no difficulty with reading them in using read.csv
. Assuming the file is stored as ~/Downloads/fb_balance_sheet.csv
one gets:
data <- read.csv("~/Downloads/fb_balance_sheet.csv", header = TRUE, skip = 1, stringsAsFactors = FALSE)
dim(data)
#> [1] 45 11
data[ ,1]
#> [1] "Assets"
#> [2] "Current assets"
#> [3] "Cash"
#> [4] "Cash and cash equivalents"
#> [5] "Short-term investments"
#> [6] "Total cash"
#> [7] "Receivables"
#> [8] "Prepaid expenses"
#> [9] "Other current assets"
#> [10] "Total current assets"
#> [11] "Non-current assets"
#> [12] "Property, plant and equipment"
#> [13] "Gross property, plant and equipment"
#> [14] "Accumulated Depreciation"
#> [15] "Net property, plant and equipment"
#> [16] "Goodwill"
#> [17] "Intangible assets"
#> [18] "Other long-term assets"
#> [19] "Total non-current assets"
#> [20] "Total assets"
#> [21] "Liabilities and stockholders' equity"
#> [22] "Liabilities"
#> [23] "Current liabilities"
#> [24] "Short-term debt"
#> [25] "Capital leases"
#> [26] "Accounts payable"
#> [27] "Taxes payable"
#> [28] "Accrued liabilities"
#> [29] "Deferred revenues"
#> [30] "Other current liabilities"
#> [31] "Total current liabilities"
#> [32] "Non-current liabilities"
#> [33] "Long-term debt"
#> [34] "Capital leases"
#> [35] "Deferred taxes liabilities"
#> [36] "Other long-term liabilities"
#> [37] "Total non-current liabilities"
#> [38] "Total liabilities"
#> [39] "Stockholders' equity"
#> [40] "Preferred stock"
#> [41] "Additional paid-in capital"
#> [42] "Retained earnings"
#> [43] "Accumulated other comprehensive income"
#> [44] "Total stockholders' equity"
#> [45] "Total liabilities and stockholders' equity"
Created on 2018-09-17 by the reprex package (v0.2.0).
Upvotes: 2