Reputation: 11
I used map()and read_xlsx() to extract some data (from specific range in excel file) from 36 excel xlsx files into one file. Basically, I tweaked the codes from the blog.(https://serialmentor.com/blog/2016/6/13/reading-and-combining-many-tidy-data-files-in-R, part:Keeping auxilliary information about the files read).
It worked very well. I can get a nested dataframe. But the unnest() step did not work. I found out the reason is that the unnest() can not handle mixture of the different types. One column is number type in most lists, but the column is character type in one list. so unnest() could not work.
Is it possible I can let unnest() to convert all numbers in every file to characters so the unnest will work? if no way we can let unnest do it, are there other methods to unnest the data?
the system is windows, R version 3.5 other attached packages version:
[1] readxl_1.3.0 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.0.1
purrr_0.3.1
[6] readr_1.3.1 tidyr_0.8.3 tibble_2.0.1 ggplot2_3.1.0
tidyverse_1.2.1
after map() and read_xlsx():
the code:
file_path <- "files"
files_name <- dir(path = file_path, pattern = "*.xlsx", all.files =FALSE)
data <- data_frame(filename = files_name) %>% # create a data frame
# holding the file names
mutate(file_contents = map(filename, # read files into
~ read_xlsx(file.path(file_path, .), range =
"A56:J60")) # a new data column
) '
the data is:
> data
># A tibble: 36 x 2
> filename file_contents
> <chr> <list>
>1 JC04Feb19Run003.xlsx <tibble [4 x 10]>
>2 JC04Feb19Run004.xlsx <tibble [4 x 10]>
>3 JC06Feb19Run005.xlsx <tibble [4 x 10]>
>4 JC06Feb19Run006.xlsx <tibble [4 x 10]>
>5 JC07Feb2019Run007.xlsx <tibble [4 x 10]>
>6 JC07Feb2019Run008.xlsx <tibble [4 x 10]>
>7 JC11Feb2019Run009.xlsx <tibble [4 x 10]>
>8 JC12Feb2019Run011.xlsx <tibble [4 x 10]>
>9 JC12Feb2019Run012.xlsx <tibble [4 x 10]>
>10 JC13Feb2019Run015 (2).xlsx <tibble [4 x 10]>
># ... with 26 more rows
after unnest(data)
> unnest(data)
Error: Column `CON` can't be coverted from numeric to character
after str(data):
..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of 10
variables:
.. ..$ Sample : chr "LQC-1" NA "LQC-2" NA
.. ..$ Wells : chr "H3" "H4" "C9" "C10"
.. ..$ NOM pg/mL: num 50 NA 50 NA
.. ..$ SIGNAL : num 256 257 314 213
.. ..$ CON : num 98.8 99.1 112.5 87.7
.. ..$ XCON : num 99 NA 100 NA
.. ..$ SDCON : num 0.18 NA 17.5 NA
.. ..$ CVCON : num 0.2 NA 17.5 NA
.. ..$ %REC : num 198 NA 200 NA
.. ..$ STATUS : chr "FAIL" NA "FAIL" NA
..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of 10
variables:
.. ..$ Sample : chr "LQC-1" NA "LQC-2" NA
.. ..$ Wells : chr "H3" "H4" "C9" "C10"
. ..$ NOM pg/mL: num 50 NA 50 NA
.. ..$ SIGNAL : num 878 314 9 222
.. ..$ CON : chr "132.69999999999999" "48.68" "Range?"
"34.979999999999997"
.. ..$ XCON : num 90.7 NA 35 NA
.. ..$ SDCON : num 59.4 NA 0 NA
.. ..$ CVCON : num 65.5 NA 0 NA
.. ..$ %REC : num 181 NA 70 NA
.. ..$ STATUS : chr "FAIL" NA "PASS" NA
it showed that the CON Column is num, in 36 columns, while in one column is chr.
Upvotes: 1
Views: 1701
Reputation: 1632
Use colClasses = 'character'
in read_xlsx
and this should work.
or
col_types = 'text'
Upvotes: 1