Reputation: 107
I have a dataframe that has duplicate IDs with unique values. I need to make the IDs unique.
df <- read.csv("test.csv")
ID: A1, A1, A2, A2, A3, A3, A4, A4
Value: 0.5, 0.9, 1.5, 0.8, 2.2, 2.4, 3.1, 0.5
I need to get this dataframe:
ID: A1_1, A1_2, A2_1, A2_2, A3_1, A3_2, A4_1, A4_2
Value: 0.5, 0.9, 1.5, 0.8, 2.2, 2.4, 3.1, 0.5
I tried the following code which adds a column with repeating alternating _1 and _2 and concatenates to the ID:
unique <- c("_1", "_2")
Unique.col <- matrix(rep(unique, ))
unique_ID <- cbind(df, Unique.col)
unique_ID$ID <- paste(unique_ID$ID, unique_ID$Unique.col)
unique_ID
I get the following dataframe where there is a space between the A1 and _1:
ID: A1 _1, A1 _2, A2 _1, A2 _2, A3 _1, A3 _2, A4 _1, A4 _2
Value: 0.5, 0.9, 1.5, 0.8, 2.2, 2.4, 3.1, 0.5
Is there a better way to do this or a way to get rid of the space?
Upvotes: 1
Views: 215
Reputation: 51
A general dplyr
/tidyr
way to solve this is to leverage pivot_longer
and pivot_wider
together: lengthen, then group by the original column names and create unique within-group IDs, then widen back. This seems a little bit deviant from the usual spirit of pivoting, but it gets the job done!
Sample data:
df <- tribble(
~"A1", ~"A1", ~"A2", ~"A2", ~"A3", ~"A3", ~"A4", ~"A4",
1, 2, 3, 4, 5, 6, 7, 8
)
For creating the unique within-group IDs, see this answer. For combining those IDs with the original column names, the tidyr
pivoting vignette has some great examples (e.g. here). The key is to use the names_from
argument to combine the original column name with the new IDs. This gets us to:
df %>%
# Pivot original column names to "name" column and original values to "value" column
pivot_longer(cols=everything()) %>%
# Create unique IDs within each original column
group_by(name) %>%
mutate(row_id=row_number()) %>%
# Pivot back to the desired wider format
pivot_wider(names_from=c(name, row_id))
Output:
# A tibble: 1 x 8
A1_1 A1_2 A2_1 A2_2 A3_1 A3_2 A4_1 A4_2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 3 4 5 6 7 8
We can also change the format of the new column names in the call to pivot_wider
, either with names_sep
(default is _
) or with names_pattern
(which takes a regular expression).
Upvotes: 1
Reputation: 854
You can use gsub(" ","",unique_ID)
to remove spaces
Example:
unique_ID <- c("A1 _1", "A1 _2", "A2 _1", "A2 _2", "A3 _1", "A3 _2", "A4 _1",
"A4 _2")
test <- gsub(" ","",unique_ID)
> test
[1] "A1_1" "A1_2" "A2_1" "A2_2" "A3_1" "A3_2" "A4_1"
[8] "A4_2"
Upvotes: 0