Reputation: 45
I have a csv file (dropbox link), which contains two types of information separated by a blank/empty line:
The header of the second table consists of 3 separate lines/rows. Ideally I would like to keep only one, which starts with ID.
I would like to import the above into R
in a list
format:
> print(data)
$experiment name
[1] "test1"
$date
[1] "01/01/2015"
$protocol
[1] "test2"
$name
[1] "John Doe"
$data
ID name value
1 A1 AA 0.1
2 A2 BB 0.16
3 A3 CC 0.12
4 A4 DD 0.45
5 A5 EE 0.35
6 B1 FF 1.2
7 B2 GG 1.12
8 B3 HH 1.05
9 B4 II 1.21
10 B5 JJ 1.18
I have tried to scan
the csv file and conditionally import the lines, but to no avail. Is there a base function or a set of functions that would allow me to import the two tables separated with an empty line, and conditionally read-in the second table from the ID character onwards?
I would greatly appreciate your help!
Upvotes: 1
Views: 150
Reputation: 7941
A starting point would be:
data <- list()
all <- read.csv("data.csv", header=F, stringsAsFactors=F)
blank.line <- which(all[[1]]=="")
for (i in 1:(blank.line-1)) data[[all$V1[i]]] <- all$V2[i]
data$data <- read.csv("data.csv", skip=grep("^ID",all[,1])-1, header=T, stringsAsFactors=F)
the names of list elements in data aren't exactly as you've written - these could be fixed with gsub()
ing the colons out if it is important.
Code edited in the light of comments about top table having a variable length
Upvotes: 1
Reputation: 121568
You can do something like this (not exactly what do you want but it is a good start) :
## use readlines since you unstrctred data
ll <- readLines("data.csv")
list(
## I assume that the data is always in 3 first lines
## remove extra comma then use read.table
settings = read.table(text=gsub(',','',ll[seq(3)]),sep=':'),
## find where the data begin using grep
data= read.table(text=ll[-seq(grep("ID",ll)-1)],header=TRUE,sep=','))
# $settings
# V1 V2
# 1 experiment name test1
# 2 date 01/01/2015
# 3 protocol test2
#
# $data
# ID name value
# 1 A1 AA 0.10
# 2 A2 BB 0.16
# 3 A3 CC 0.12
# 4 A4 DD 0.45
# 5 A5 EE 0.35
# 6 B1 FF 1.20
# 7 B2 GG 1.12
# 8 B3 HH 1.05
# 9 B4 II 1.21
# 10 B5 JJ 1.18
Upvotes: 2