antecessor
antecessor

Reputation: 2800

Create factorial column based on same text in rownames in a dataframe in R

I have this example of dataframe.

df <- structure(list(PC1 = c(-0.0277818345657933, -0.0342426301759117, 
-0.0328199061848987, 0.0557338197779853, 0.042369402931087), 
    PC2 = c(-0.0149291182738773, -0.00862145986823889, -0.0101822421485786, 
    -0.00862630869071877, -0.00419434673647331)), row.names = c("Homo sapiens - ULAC-0968", 
"Homo sapiens - ULAC-0978", "Homo sapiens - ULAC-0996", "Pan troglodytes - HTB2804", 
"Pan troglodytes - HTB411"), class = "data.frame")

What I would like is to create an extra column, named Species, with the content of the row names. In this case, the factors would be only Homo sapiens and Pan troglodytes.

How could I proceed?

Upvotes: 0

Views: 58

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389135

Base R option.

Use sub to drop everything from - and delete the rownames.

df$Species <- trimws(sub('-.*', '', rownames(df)))
rownames(df) <- NULL
df

#      PC1      PC2         Species
#1 -0.0278 -0.01493    Homo sapiens
#2 -0.0342 -0.00862    Homo sapiens
#3 -0.0328 -0.01018    Homo sapiens
#4  0.0557 -0.00863 Pan troglodytes
#5  0.0424 -0.00419 Pan troglodytes

Upvotes: 1

Samet S&#246;kel
Samet S&#246;kel

Reputation: 2670

library(tidyverse)

df %>%
rownames_to_column(var = 'Species') %>%
mutate(Species=sapply(strsplit(Species,split = ' -'),function(x) as.factor(x[1])))

output;

 Species             PC1      PC2
  <fct>             <dbl>    <dbl>
1 Homo sapiens    -0.0278 -0.0149 
2 Homo sapiens    -0.0342 -0.00862
3 Homo sapiens    -0.0328 -0.0102 
4 Pan troglodytes  0.0557 -0.00863
5 Pan troglodytes  0.0424 -0.00419

Upvotes: 1

Related Questions