Reputation: 149
I have a geochemical data set that is has 50 columns and 16000 rows. I need to concatenate data in rows in the Field_Notes
column grouped by the Sample_ID
. I have tried using the aggregate function which works but I end up with a subset of only those two columns and duplicates deleted. I would be very grateful if anyone has any suggestions.
My data looks like this:
Sampe_ID Year Alt_Min Field_Notes X
1 97PQQT007 1997 Fe-Carb qtz sweel 2
2 97PQQT007 1997 Fe-Carb v. tr 2
3 97PQQT014 1997 <NA> qtz vn in bslt 15
4 97PQQT014 1997 <NA> 1-2% py,cpy 15
5 97PQQT006 1997 Fe-Carb qtz vn 2
6 97PQQT004 1997 Fe-Carb qtz vn 1
7 97PQQT004 1997 Fe-Carb none 1
But should look like this:
Sampe_ID Year Alt_Min Field_Notes X
1 97PQQT007 1997 Fe-Carb qtz sweel, v. tr 2
2 97PQQT007 1997 Fe-Carb qtz sweel, v. tr 2
3 97PQQT014 1997 <NA> qtz vn in bslt, 1-2% py,cpy 15
4 97PQQT014 1997 <NA> qtz vn in bslt, 1-2% py,cpy 15
5 97PQQT006 1997 Fe-Carb qtz vn 2
6 97PQQT004 1997 Fe-Carb qtz vn, none 1
7 97PQQT004 1997 Fe-Carb qtz vn, none 1
Here is a reproducible dataframe:
geochem <- data.frame(Sample_ID= c(1,1,2,2,3,4,4), Year = rep(1997, 7), Alt_Min = c(rep("Fe-Carb",2), rep(NA,2), rep("Fe-Carb",3)), Field_Notes = c("qtz sweel", "v. tr", "qtz vn in bslt", "1-2% py,cpy", "qtz vn", "qtz vn", "none"), x = c(2,2,15,15,2,1,1))
Upvotes: 3
Views: 2077
Reputation: 81683
You can use a combination of within
and ave
to transform the data frame.
within(geochem,
{Field_Notes <- as.character(Field_Notes);
Field_Notes <- ave(Field_Notes, Sample_ID, FUN = toString)})
The function toString
is used to paste strings together.
Note that the factor Field_Notes
is transformed to a string variable.
The result:
Sample_ID Year Alt_Min Field_Notes x
1 1 1997 Fe-Carb qtz sweel, v. tr 2
2 1 1997 Fe-Carb qtz sweel, v. tr 2
3 2 1997 <NA> qtz vn in bslt, 1-2% py,cpy 15
4 2 1997 <NA> qtz vn in bslt, 1-2% py,cpy 15
5 3 1997 Fe-Carb qtz vn 2
6 4 1997 Fe-Carb qtz vn, none 1
7 4 1997 Fe-Carb qtz vn, none 1
Upvotes: 3
Reputation: 4406
Without a reproducible dataframe, I can't be sure if this will work but it should. Use dplyr
here and replace df
with your dataframe name.
library(dplyr)
geochem <- data.frame(Sample_ID= c(1,1,2,2,3,4,4),
Year = rep(1997, 7), Alt_Min = c(rep("Fe-Carb",2), rep(NA,2), rep("Fe-Carb",3)),
Field_Notes = c("qtz sweel", "v. tr", "qtz vn in bslt",
"1-2% py,cpy", "qtz vn", "qtz vn", "none"),
x = c(2,2,15,15,2,1,1))
geochem %>% group_by(x,Sample_ID) %>% summarise(Field_Notes=paste(Field_Notes, collapse=","))
Here is the exact output I get:
x Sample_ID Field_Notes
<dbl> <dbl> <chr>
1 4 qtz vn,none
2 1 qtz sweel,v. tr
2 3 qtz vn
15 2 qtz vn in bslt,1-2% py,cpy
Upvotes: 2