Reputation: 79
I have two .csv files which I read in as dataframes.
1)
Tree1 var1 var2 var3
Tree2 var1 var2 var3
2)
Tree1 year1 year2 year3
Tree2 year1 year2 year3
Desired output:
3)
Tree1 var1 var2 var3 year1
Tree1 var1 var2 var3 year2
Tree1 var1 var2 var3 year3
Tree2 var1 var2 var3 year1
Tree2 var1 var2 var3 year2
Tree2 var1 var2 var3 year3
As you can see, I need to copy the rows of the first dataframe and then add only one year of the second dataframe so that all the years appear in one column.
Upvotes: 0
Views: 86
Reputation: 38510
In base R, you can use merge
after reshaping df2 long. Using the data from dsen's post:
merge(df1,
reshape(df2, direction="long", idvar="x1", varying=list(2:4))[-2],
by="x1")
this returns
x1 x2.x x3 x4 x2.y
1 Tree1 var1 var2 var3 year1
2 Tree1 var1 var2 var3 year2
3 Tree1 var1 var2 var3 year3
4 Tree2 var1 var2 var3 year1
5 Tree2 var1 var2 var3 year2
6 Tree2 var1 var2 var3 year3
You can clean up the names with names <- c(...)
or wrap the above code in setNames
.
Upvotes: 2
Reputation: 39154
A solution uses functions from dplyr
and tidyr
, which is almost the same as @D.sen's solution. But I put every operation in one pipeline and uses left_join
to do the merge
function. I tried to use select
and arrange
to mimic your desired output, but these are all optional. dt3
is the final output.
# Create example datasets
dt1 <- read.table(text = "Tree1 var1 var2 var3
Tree2 var1 var2 var3",
header = FALSE, stringsAsFactors = FALSE)
dt2 <- read.table(text = "Tree1 year1 year2 year3
Tree2 year1 year2 year3",
header = FALSE, stringsAsFactors = FALSE)
# Load packages
library(dplyr)
library(tidyr)
# Process the data
dt3 <- dt2 %>%
gather(Value, Year, V2:V4) %>%
left_join(dt1, by = "V1") %>%
select(c(paste0("V", 1:4), "Year")) %>%
arrange(V1)
# See the output
dt3
V1 V2 V3 V4 Year
1 Tree1 var1 var2 var3 year1
2 Tree1 var1 var2 var3 year2
3 Tree1 var1 var2 var3 year3
4 Tree2 var1 var2 var3 year1
5 Tree2 var1 var2 var3 year2
6 Tree2 var1 var2 var3 year3
Upvotes: 2
Reputation: 922
First read in data. Replicated datasets below with ambiguous col names.
df1 <- data.frame(x1 = c("Tree1", "Tree2"),
x2 = c("var1", "var1"),
x3 = c("var2", "var2"),
x4 = c("var3", "var3"))
df2 <- data.frame(x1 = c("Tree1", "Tree2"),
x2 = c("year1", "year1"),
x3 = c("year2", "year2"),
x4 = c("year3", "year3"))
Then load the tidyr package.
install.packages('tidyr')
library(tidyr)
Then gather your second dataset.
df2 <- gather(df2,
key = "id",
value = "year",
-x1)
Finally, merge the two datasets.
df <- merge(df2, df1, by = 'x1')
Drop the id
column if necessary.
df$id <- NULL
Output looks like...
> df
x1 year x2 x3 x4
1 Tree1 year1 var1 var2 var3
2 Tree1 year2 var1 var2 var3
3 Tree1 year3 var1 var2 var3
4 Tree2 year1 var1 var2 var3
5 Tree2 year2 var1 var2 var3
6 Tree2 year3 var1 var2 var3
Upvotes: 4