Devin Mendez
Devin Mendez

Reputation: 101

Create new column based on characters

If I have a similar matrix to this:

Data = matrix(
  c('Ruppia A', 'Ruppia B', 'Ruppia C', 'Hydrobia A', 'Dog A', 'Cat A', 'Fresh',
    'Fresh', 'Fresh','Fresh', 'Dirt', 'House'),
  nrow=6,
  ncol=2,
  byrow=FALSE
)

I would like to be able to group similar records together into one column without losing any data. Something like this:

New_Data = matrix(
  c('Ruppia A', 'Ruppia B', 'Ruppia C', 'Hydrobia A', 'Dog A', 'Cat A', 'Fresh',
    'Fresh', 'Fresh','Fresh', 'Dirt', 'House', 'Ruppia', 'Ruppia', 'Ruppia',
    'Ruppia', 'Dog', 'Cat'),
  nrow=6,
  ncol=3,
  byrow=FALSE
)

For some of the records we could simply go off of genus (Ruppia), but not all of the groupings will solely be grouping based on genus and may have to combine. I am only interested in handful of species for this analysis and don't necessarily needs it to return all species. In this example we are not interested in 'Dog' and 'Cat' and they could be dropped if this makes it easier.

Upvotes: 1

Views: 69

Answers (3)

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

additional solution

Data %>% 
  as_tibble() %>% 
  tidyr::extract(V1, "out", remove = F)

Upvotes: 1

akrun
akrun

Reputation: 887118

We can use str_remove

library(dplyr)
library(stringr)
Data %>%
    as_tibble %>%
    mutate(V1_new = str_remove(V1, "\\s+[A-Z]$"))

Upvotes: 1

broti
broti

Reputation: 1382

If your new column is like your first column, but with a capital letter after a space dropped (e.g. " A"), then you can simply do this:

Data <- as.data.frame(Data) # turn into data frame first

Data %>% mutate(V1_new = gsub(" [A-Z]$", "", V1))
          V1    V2   V1_new
1   Ruppia A Fresh   Ruppia
2   Ruppia B Fresh   Ruppia
3   Ruppia C Fresh   Ruppia
4 Hydrobia A Fresh Hydrobia
5      Dog A  Dirt      Dog
6      Cat A House      Cat

Upvotes: 2

Related Questions