Reputation: 113
Maybe it's easy, but I have a csv
file with a lot of commas and R doesn't read it correctly, it puts all data in the first column and doesn't present it as a table.
Do you know how I can make to read the file correctly as a classic csv
file?
You can download the file here from world bank
Upvotes: 1
Views: 1298
Reputation: 56259
Using data.table, fread function, which has many import arguments and also faster:
library(data.table)
res <- fread("myFile.csv",
sep = ",", # separator is comma
skip = 5, # skip first 5 rows
select = c(1, 2, 4, 5), # select columns by index
na.strings = c("", ".."), # convert blanks to NA
# set column names
col.names = c("Country", "Ranking", "Economy", "USD_Mln"))
# remove blank rows
res <- res[ !is.na(Country), ]
# convert character numbers to numbers
res[ , USD_Mln := as.numeric(gsub(",", "", USD_Mln))]
head(res)
# Country Ranking Economy USD_Mln
# 1: USA 1 United States 16244600
# 2: CHN 2 China 8227103
# 3: JPN 3 Japan 5959718
# 4: DEU 4 Germany 3428131
# 5: FRA 5 France 2612878
# 6: GBR 6 United Kingdom 2471784
Upvotes: 0
Reputation: 76673
Well, it takes some time to clean the file, fortunately I have the time.
gdp2012 <- read.csv("getdata_data_GDP.csv", stringsAsFactors = FALSE)
cnames <- gdp2012[3, ]
names(cnames) <- NULL
cnames[1] <- "Abbrev"
cnames[5] <- "Millions.USD"
names(gdp2012) <- cnames
names(gdp2012)
[1] "Abbrev" "Ranking" "NA" "Economy" "Millions.USD"
[6] "" "NA" "NA" "NA" "NA"
gdp2012 <- gdp2012[, -grep("NA", names(gdp2012))]
gdp2012 <- gdp2012[, -ncol(gdp2012)]
gdp2012 <- gdp2012[-c(1:4, 237:nrow(gdp2012)), ]
dim(gdp2012)
[1] 232 4
str(gdp2012)
'data.frame': 232 obs. of 4 variables:
$ Abbrev : chr "USA" "CHN" "JPN" "DEU" ...
$ Ranking : chr "1" "2" "3" "4" ...
$ Economy : chr "United States" "China" "Japan" "Germany" ...
$ Millions.USD: chr " 16,244,600 " " 8,227,103 " " 5,959,718 " " 3,428,131 " ...
gdp2012[[4]] <- as.numeric(gsub(",", "", gdp2012[[4]]))
Warning message:
NAs introduced by coercion
gdp2012[[2]] <- as.numeric(gdp2012[[2]])
head(gdp2012)
Abbrev Ranking Economy Millions.USD
5 USA 1 United States 16244600
6 CHN 2 China 8227103
7 JPN 3 Japan 5959718
8 DEU 4 Germany 3428131
9 FRA 5 France 2612878
10 GBR 6 United Kingdom 2471784
If you want the row numbers to start at 1, just do
rownames(gdp2012) <- NULL
Upvotes: 1
Reputation: 375
Delete row 1,-3,238-241 in csv itself
use rio library. data=import("filename").
There are lots of blank rows in data. so you can use data[rowSums(is.na(data)) == 0,]
Upvotes: 0