Read .csv - Separator Issues

Question

I'm trying to read a csv from the following link: http://databank.worldbank.org/data/download/GDP.csv

I have two problems:

This table has different separators between its columns (e.g. the first and second column is separated by one comma, but the second and third column is separated by two commas).
Each row ends with 5 commas.

I thought about reading the table with the function read.fwf() to solve problems 1 and 2. However, I don't think this is a proper solution because values within some columns may vary in length (e.g. in the Country column one may find "United States" and "Italy").

MrFlick · Accepted Answer

Clearly this "CSV" file has been formatted to look pretty, not to actually be useful. It's not that it has different separators, it's that it has missing columns. How about cleaning it up with something like

dd <- read.csv("http://databank.worldbank.org/data/download/GDP.csv", skip=5, header=F)[,c(1,2,4,5)]
names(dd) <- c("CountryID","Ranking","Economy","GDP")
dd<-dd[dd[,1]!="",] #get rid of rows without IDs

head(dd)

#   CountryID Ranking        Economy          GDP
# 1       USA       1  United States  16,800,000 
# 2       CHN       2          China   9,240,270 
# 3       JPN       3          Japan   4,901,530 
# 4       DEU       4        Germany   3,634,823 
# 5       FRA       5         France   2,734,949 
# 6       GBR       6 United Kingdom   2,522,261

R doesn't like commas in numbers so you'll probably also want

dd$GDP <- as.numeric(gsub(",","",dd$GDP))

Read .csv - Separator Issues

Answers (1)

Related Questions