Reputation: 1711
I have an example data that saved as csv
file in this websit.
The 1.csv
was sent to me by someone else and I can not read it into R correctly using read.csv
.
> dat = read.csv('1.csv')
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 2 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'data/hanze/1.csv'
Then I also tried adding sep
in read.csv
but also failed.
dat = read.csv('1.csv', sep = ',')
dat = read.csv('1.csv', sep = '\t')
Finally I re-save the 1.csv
file using Microsoft Excel
as a new csv file with comma separator named 1_test.csv
and it works.
dat = read('1_test.csv', encoding = 'UTF-8')
head(dat)
id station lon lat RASTERVALU
1 1 东四 116.417 39.929 0.2406870
2 2 天坛 116.407 39.886 0.0992821
3 3 官园 116.339 39.929 0.1243020
4 4 万寿西宫 116.352 39.878 0.2394120
5 5 奥体中心 116.397 39.982 0.2368810
6 6 农展<e9><U+00A6>? 116.461 39.937 0.2307600
In my real situation, I have hundreds of file like 1.csv
and I do not want to re-save them as a new csv file using Microsoft Excel
.
My question is that is there a way that could read the 1.csv
straightly and correctly into R without re-save it?
Upvotes: 3
Views: 326
Reputation: 26495
This may introduce unforeseen errors, but it appears to provide the expected output:
library(data.table)
library(tidyverse)
test <- fread(file = "~/Downloads/1.csv")
#> Warning in fread(file = "~/Downloads/1.csv"): Detected 1 column names but the
#> data has 140 columns (i.e. invalid file). Added 139 extra default column names
#> at the end.
test_df <- as.data.frame(matrix(unlist(test, use.names = FALSE), ncol = 4, byrow = TRUE))
test_df %>%
separate(V1, c("id", "station"), extra = "merge") %>%
mutate(station = gsub(pattern = "0", replacement = "", x = station)) %>%
rename("lon" = V2,
"lat" = V3,
"RASTERVALU" = V4)
#> id station lon lat RASTERVALU
#> 1 1 东四 116.417 39.929 0.240687
#> 2 2 天坛 116.407 39.886 0.0992821
#> 3 3 官园 116.339 39.929 0.124302
#> 4 4 万寿西宫 116.352 39.878 0.239412
#> 5 5 奥体中心 116.397 39.982 0.236881
#> 6 6 农展馆 116.461 39.937 0.23076
#> 7 7 万柳 116.287 39.987 0.201353
#> 8 8 北部新区 116.174 40.09 0.170883
#> 9 9 植物园 116.207 40.002 0.210636
#> 10 10 丰台花园 116.279 39.863 0.225224
#> 11 11 云岗 116.146 39.824 0.23084
#> 12 12 古城 116.184 39.914 0.17514
#> 13 13 房山良乡 116.136 39.742 0.243377
#> 14 14 大兴黄村镇 116.404 39.718 0.295714
#> 15 15 亦庄开发区 116.506 39.795 0.315679
#> 16 16 通州新城 116.663 39.886 0.255555
#> 17 17 顺义新城 116.655 40.127 0.212804
#> 18 18 昌平镇 116.23 40.217 0.160067
#> 19 19 门头沟龙泉镇 116.106 39.937 0.17251
#> 20 20 平谷镇 117.1 40.143 0.275457
#> 21 21 怀柔镇 116.628 40.328 0.177003
#> 22 22 密云镇 116.832 40.37 0.253771
#> 23 23 延庆镇 115.972 40.453 0.219738
#> 24 24 昌平定陵 116.22 40.292 0.15908
#> 25 25 京西北八达岭 115.988 40.365 -9999
#> 26 26 京东北密云水库 116.911 40.499 0.173666
#> 27 27 京东东高村 117.12 40.1 0.276452
#> 28 28 京东南永乐店 116.783 39.712 0.278231
#> 29 29 京南榆垡 116.3 39.52 0.533654
#> 30 30 京西南琉璃河 116 39.58 0.449057
#> 31 31 前门东大街 116.395 39.899 0.236876
#> 32 32 永定门内大街 116.394 39.876 0.148231
#> 33 33 西直门北大街 116.349 39.954 0.234347
#> 34 34 南三环西路 116.368 39.856 0.177043
#> 35 35 东四环北路 116.483 39.939 0.253252
Created on 2021-07-26 by the reprex package (v2.0.0)
Upvotes: 3