Reputation: 13070
This is a "let's try another way" post that is related to this one:
Is it possible to define/modify a reading function that
can handle the fact that data stored in an xlsx
file is row-based (i.e. each row represents a variable)
and transforms it accordingly so it can be stored in a column-based data.frame
(i.e. what used to be a row in xlsx
becomes a column)
while capturing the underlying class/data type of the row-based variables?
Regarding csv
files I would probably start with turning to readLines
, but unfortunately xlsx
is still a black box to me.
Here's a little xlsx
file that features examples for both data orientations: https://github.com/rappster/stackoverflow/blob/master/excel/row-and-column-based-data.xlsx
Upvotes: 1
Views: 3386
Reputation: 1206
You can also try this utility with this code
install.packages("remotes")
remotes::install_github("atusy/mytools")
library(mytools)
my_df <- read_excel2("my_excel_file.xlsx", sheet = 1, transposing = TRUE, error_as_NA = TRUE, rm_blank_col = TRUE)
My excel sheet had the desired column headers in the second column, which meant they ended up in the first row, which I then fixed with janitor, with the method described here
x %>%
row_to_names(row_number = 1)
Upvotes: 1
Reputation: 24178
What about slightly modifying the read.xlsx
function from the xlsx
package:
library(xlsx)
read.transposed.xlsx <- function(file,sheetIndex) {
df <- read.xlsx(file, sheetIndex = sheetIndex , header = FALSE)
dft <- as.data.frame(t(df[-1]), stringsAsFactors = FALSE)
names(dft) <- df[,1]
dft <- as.data.frame(lapply(dft,type.convert))
return(dft)
}
# Let's test it
read.transposed.xlsx("row-and-column-based-data.xlsx", sheetIndex = 2)
# variable var_1 var_2 var_3
#1 2016-01-01 1 a TRUE
#2 2016-01-02 2 b FALSE
#3 2016-01-03 3 c TRUE
Upvotes: 3
Reputation: 1606
> library(openxlsx)
> library(reshape)
> x=read.xlsx("row-and-column-based-data.xlsx",sheet = 2);
> x
variable 2016-01-01 2016-01-02 2016-01-03
1 var_1 1 2 3
2 var_2 a b c
3 var_3 TRUE FALSE TRUE
> y=t(x)
> colnames(y)=y[1,]
> y=y[2:nrow(y),]
> cc=data.frame(y, stringsAsFactors = F)
> cc
var_1 var_2 var_3
2016-01-01 1 a TRUE
2016-01-02 2 b FALSE
2016-01-03 3 c TRUE
> sapply(cc, class)
var_1 var_2 var_3
"character" "character" "character"
> write.csv(cc,"temp.csv")
> bb=read.csv("temp.csv") #infer magically types
> bb
X var_1 var_2 var_3
1 2016-01-01 1 a TRUE
2 2016-01-02 2 b FALSE
3 2016-01-03 3 c TRUE
> sapply(bb, class)
X var_1 var_2 var_3
"factor" "integer" "factor" "logical"
or use stringsAsFactors=F if you prefer character data type:
> bb=read.csv("temp.csv", stringsAsFactors = F) #infer magically types
> bb
X var_1 var_2 var_3
1 2016-01-01 1 a TRUE
2 2016-01-02 2 b FALSE
3 2016-01-03 3 c TRUE
> sapply(bb, class)
X var_1 var_2 var_3
"character" "integer" "character" "logical"
Upvotes: 1