CarlosSR
CarlosSR

Reputation: 103

Merge dataframes using an extra condition r

I know it should be an easier or smarter way of doing what I need, but I haven't found it yet after several days.

I have 2 dataframes that I need to merge using a extra condition. For example:

df1 <- data.frame(Username = c("user1", "user2", "user3", "user4", "user5", "user6"))
df2 <- data.frame(File_Name = c(rep("StudyABC", 5), rep("AnotherStudyCDE", 4)), Username = c("user1", rep(c("user2", "user3", "user4", "user5"),2)))
print(df1)
print(df2)

What I need is to create 2 new columns in df1 called ABC and CDE that includes their "File_Name" values. Of course the real data is hundreds of lines and not ordered so no way of selecting by range.

One of the solutions (not elegant) that I have found is:

df2_filtered <- df2 %>% filter(str_detect(File_Name, "ABC"))
df1 <- left_join(df1, df2_filtered, by = "Username")
names(df1)[2] <- "ABC"

df2_filtered <- df2 %>% filter(str_detect(File_Name, "CDE"))
df1 <- left_join(df1, df2_filtered, by = "Username")
names(df1)[3] <- "CDE"

print(df1)

Is there a shortest way of doing it? Because I have to repeat the same logic 160 times.

Thanks

Upvotes: 1

Views: 56

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388797

You can extract either "ABC" or "CDE" from File_Name and cast the data into wide format. We can join the data with df1 to get all the Username in the final dataframe.

library(dplyr)

df2 %>%
  mutate(name = stringr::str_extract(File_Name, 'ABC|CDE')) %>%
  tidyr::pivot_wider(names_from = name, values_from = File_Name) %>%
  right_join(df1, by = 'Username')

#  Username ABC      CDE            
#  <chr>    <chr>    <chr>          
#1 user1    StudyABC NA             
#2 user2    StudyABC AnotherStudyCDE
#3 user3    StudyABC AnotherStudyCDE
#4 user4    StudyABC AnotherStudyCDE
#5 user5    StudyABC AnotherStudyCDE
#6 user6    NA       NA             

Upvotes: 1

stlba
stlba

Reputation: 767

If one study can have more than one file path (which I assume is the case from your previous attempts), just converting your data to a wide format before joining won't work as you'll have one column per file path, not per study.

One method in this case could be to use a for-loop to create an additional column in df2 with the study name, then convert the data to a wide format using pivot_wider.

It's not a very R method though so I'd welcome suggestions to avoid creating the empty study column and the for-loop

studies <- c("ABC", "CDE")

#create empty column named "study"
df2 <- df2 %>% 
  mutate(study = NA_character_)

for (i in studies) {
df2 <- df2 %>% 
  mutate(study = if_else(grepl(i, File_Name), i, study))
}

df2 <- df2 %>% 
  pivot_wider(names_from = study, values_from = File_Name)
> df2
# A tibble: 5 x 3
  Username ABC      CDE            
  <chr>    <chr>    <chr>          
1 user1    StudyABC NA             
2 user2    StudyABC AnotherStudyCDE
3 user3    StudyABC AnotherStudyCDE
4 user4    StudyABC AnotherStudyCDE
5 user5    StudyABC AnotherStudyCDE

df2 is now in a wide format and you can join it to df1 as before to get your desired output.

df3 <- left_join(df1, df2)

Upvotes: 1

Jonny Phelps
Jonny Phelps

Reputation: 2717

What you're looking for is a way of casting data from long to wide eg using data.table package I would do this:

library(data.table)

# converts data.frame to data.table
dt <- as.data.table(df2)

# I copy the file_name so one is used for the pivotting for long to wide and the other is used for filling in the data
dt[, study := File_Name]
dt_wide <- dcast(Username~File_Name, data=dt, value.var = "study")

# have a look at df2 in wide format
dt_wide[]

# now its just a direct merge to pull it back in to df1 and turn 
# back in to data.frame for you
out <- merge(as.data.table(df1), dt_wide, by="Username", all.x=TRUE)
setDF(out)
out

Plenty of tutorials on melting/casting even without data.table. It's just knowing what to search for eg Google throws up https://ademos.people.uic.edu/Chapter8.html as the first result.

Upvotes: 1

Related Questions