Reputation: 888
I have a dataframe like this:
|DOI | WoS| Scopus| Dim| WoS_Year| Scopus_Year| Dim_Year|
|:-----------------------------|---:|------:|---:|--------:|-----------:|--------:|
|10.1515/jag-2017-0010 | NA| 1| 1| NA| 2017| 2017|
|10.1007/978-3-662-55771-6_9 | NA| NA| 1| NA| NA| 2020|
|10.1088/1361-6668/30/2/024004 | 1| 1| NA| 2017| 2017| NA|
|10.3390/ma12010124 | 1| 1| NA| 2019| 2019| NA|
|10.1002/ppsc.201700109 | 1| 1| 1| 2017| 2017| 2017|
I want to split it into a list of 3 dataframes (list_of_df
), whereby:
list_of_df$WoS
should have all DOI
that have WoS = 1
, and a column Year
that resembles the old WoS_Year
;list_of_df$Scopus
should have all DOI
that have Scopus= 1
, and a column Year
resembling the old Scopus_Year
;list_of_df$Dim
should have all DOI
that have Dim= 1
, and a column Year
resembling the old Dim_Year
.(There are actually multiple columns starting with WoS_*
or Scopus_*
or Dim_*
, each of which I wish to keep in the new respective list_of_df
, but by dropping the starting strings like WoS_
etc.
E.g., all columns that are starting_with("Scopus_")
should be in list_of_df$Scopus
but without Scopus_
in the column name.)
What would be the best approach to reach this goal?
My attempts with list_of_df <- split(df, with(df, interaction(WoS,Scopus,Dim)), drop = TRUE)
or with dplyr::nest(df, WoS:Dim)
did not bear fruits...
Thank you for your help!
> dput(df)
structure(list(DOI = c("10.1515/jag-2017-0010", "10.1007/978-3-662-55771-6_9",
"10.1088/1361-6668/30/2/024004", "10.3390/ma12010124", "10.1002/ppsc.201700109"
), WoS = c(NA, NA, 1L, 1L, 1L), Scopus = c(1L, NA, 1L, 1L, 1L
), Dim = c(1L, 1L, NA, NA, 1L), WoS_Year = c(NA, NA, 2017L, 2019L,
2017L), Scopus_Year = c(2017L, NA, 2017L, 2019L, 2017L), Dim_Year = c(2017L,
2020L, NA, NA, 2017L)), row.names = c(2186L, 9505L, 12281L, 11882L,
874L), class = "data.frame")
Upvotes: 0
Views: 141
Reputation: 10375
Another one in base R, if this is the expected output
res=list()
for (k in c("WoS","Scopus","Dim")) {
res[[k]]=df[df[,k]==1 & !is.na(df[,k]),grepl(k,colnames(df)) | c(TRUE,rep(FALSE,ncol(df)-1))]
colnames(res[[k]])=gsub(paste0(k,"_"),"",colnames(res[[k]]))
}
$WoS
DOI WoS Year
12281 10.1088/1361-6668/30/2/024004 1 2017
11882 10.3390/ma12010124 1 2019
874 10.1002/ppsc.201700109 1 2017
$Scopus
DOI Scopus Year
2186 10.1515/jag-2017-0010 1 2017
12281 10.1088/1361-6668/30/2/024004 1 2017
11882 10.3390/ma12010124 1 2019
874 10.1002/ppsc.201700109 1 2017
$Dim
DOI Dim Year
2186 10.1515/jag-2017-0010 1 2017
9505 10.1007/978-3-662-55771-6_9 1 2020
874 10.1002/ppsc.201700109 1 2017
Upvotes: 1
Reputation: 79288
In base R you could do:
df1 <- subset(reshape(df, matrix(2:ncol(df),2, byrow=TRUE), dir="long", idvar = "DOI",
times = c("WoS","Scopus","Dim")), WoS==1)
rownames(df1)<-NULL
split(df1, df1$time)
$Dim
DOI time WoS WoS_Year
8 10.1515/jag-2017-0010 Dim 1 2017
9 10.1007/978-3-662-55771-6_9 Dim 1 2020
10 10.1002/ppsc.201700109 Dim 1 2017
$Scopus
DOI time WoS WoS_Year
4 10.1515/jag-2017-0010 Scopus 1 2017
5 10.1088/1361-6668/30/2/024004 Scopus 1 2017
6 10.3390/ma12010124 Scopus 1 2019
7 10.1002/ppsc.201700109 Scopus 1 2017
$WoS
DOI time WoS WoS_Year
1 10.1088/1361-6668/30/2/024004 WoS 1 2017
2 10.3390/ma12010124 WoS 1 2019
3 10.1002/ppsc.201700109 WoS 1 2017
You could change the column names to match what you want
Another way:
lapply(split.default(df[-1],sub("_.*","",names(df[-1]))),
function(x)na.omit(cbind(df[1], x)[x[[1]]==1,]))
$Dim
DOI Dim Dim_Year
2186 10.1515/jag-2017-0010 1 2017
9505 10.1007/978-3-662-55771-6_9 1 2020
874 10.1002/ppsc.201700109 1 2017
$Scopus
DOI Scopus Scopus_Year
2186 10.1515/jag-2017-0010 1 2017
12281 10.1088/1361-6668/30/2/024004 1 2017
11882 10.3390/ma12010124 1 2019
874 10.1002/ppsc.201700109 1 2017
$WoS
DOI WoS WoS_Year
12281 10.1088/1361-6668/30/2/024004 1 2017
11882 10.3390/ma12010124 1 2019
874 10.1002/ppsc.201700109 1 2017
Upvotes: 1