BruderK
BruderK

Reputation: 79

Combining Two Dataframes so that all the years appear in one column

I have two .csv files which I read in as dataframes.

1)

Tree1  var1  var2  var3
Tree2  var1  var2  var3


2)

Tree1 year1  year2  year3
Tree2 year1  year2  year3

Desired output:

3)

Tree1  var1  var2  var3  year1
Tree1  var1  var2  var3  year2
Tree1  var1  var2  var3  year3
Tree2  var1  var2  var3  year1
Tree2  var1  var2  var3  year2
Tree2  var1  var2  var3  year3

As you can see, I need to copy the rows of the first dataframe and then add only one year of the second dataframe so that all the years appear in one column.

Upvotes: 0

Views: 86

Answers (3)

lmo
lmo

Reputation: 38510

In base R, you can use merge after reshaping df2 long. Using the data from dsen's post:

merge(df1,
      reshape(df2, direction="long", idvar="x1", varying=list(2:4))[-2],
      by="x1")

this returns

     x1 x2.x   x3   x4  x2.y
1 Tree1 var1 var2 var3 year1
2 Tree1 var1 var2 var3 year2
3 Tree1 var1 var2 var3 year3
4 Tree2 var1 var2 var3 year1
5 Tree2 var1 var2 var3 year2
6 Tree2 var1 var2 var3 year3

You can clean up the names with names <- c(...) or wrap the above code in setNames.

Upvotes: 2

www
www

Reputation: 39154

A solution uses functions from dplyr and tidyr, which is almost the same as @D.sen's solution. But I put every operation in one pipeline and uses left_join to do the merge function. I tried to use select and arrange to mimic your desired output, but these are all optional. dt3 is the final output.

# Create example datasets
dt1 <- read.table(text = "Tree1  var1  var2  var3
Tree2  var1  var2  var3",
                  header = FALSE, stringsAsFactors = FALSE)

dt2 <- read.table(text = "Tree1 year1  year2  year3
Tree2 year1  year2  year3",
                  header = FALSE, stringsAsFactors = FALSE)

# Load packages
library(dplyr)
library(tidyr)

# Process the data
dt3 <- dt2 %>%
  gather(Value, Year, V2:V4) %>%
  left_join(dt1, by = "V1") %>%
  select(c(paste0("V", 1:4), "Year")) %>%
  arrange(V1)

# See the output
dt3
     V1   V2   V3   V4  Year
1 Tree1 var1 var2 var3 year1
2 Tree1 var1 var2 var3 year2
3 Tree1 var1 var2 var3 year3
4 Tree2 var1 var2 var3 year1
5 Tree2 var1 var2 var3 year2
6 Tree2 var1 var2 var3 year3

Upvotes: 2

D.sen
D.sen

Reputation: 922

First read in data. Replicated datasets below with ambiguous col names.

df1 <- data.frame(x1 = c("Tree1", "Tree2"),
           x2 = c("var1", "var1"),
           x3 = c("var2", "var2"),
           x4 = c("var3", "var3"))
df2 <- data.frame(x1 = c("Tree1", "Tree2"),
           x2 = c("year1", "year1"),
           x3 = c("year2", "year2"),
           x4 = c("year3", "year3"))

Then load the tidyr package.

install.packages('tidyr')
library(tidyr)

Then gather your second dataset.

df2 <- gather(df2,
      key = "id",
      value = "year",
      -x1)

Finally, merge the two datasets.

df <- merge(df2, df1, by = 'x1')

Drop the id column if necessary.

df$id <- NULL

Output looks like...

> df
     x1  year   x2   x3   x4
 1 Tree1 year1 var1 var2 var3
 2 Tree1 year2 var1 var2 var3
 3 Tree1 year3 var1 var2 var3
 4 Tree2 year1 var1 var2 var3
 5 Tree2 year2 var1 var2 var3
 6 Tree2 year3 var1 var2 var3

Upvotes: 4

Related Questions