Getting dataframe in right format for cluster analysis

On an example R dataset:

data("USArrests") when I use head(USArrests) I get the following results:

            Murder Assault UrbanPop Rape
Alabama      13.2     236       58 21.2
Alaska       10.0     263       48 44.5
Arizona       8.1     294       80 31.0
Arkansas      8.8     190       50 19.5
California    9.0     276       91 40.6
Colorado      7.9     204       78 38.7

When I use str(USArrests) the following results come up:

'data.frame':   50 obs. of  4 variables:
  $ Murder  : num  13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
  $ Assault : int  236 263 294 190 276 204 110 238 335 211 ...
  $ UrbanPop: int  58 48 80 50 91 78 77 72 80 60 ...
  $ Rape    : num  21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...

Even though there is another column with the different states (no column header). How do I get my data so that first column so does not appear when I use the str function? I have a list of countries that I'm trying to cluster, but I can't use the scale function as obviously first column is not numeric, but I can't create a new dataframe without that column, as I'm trying to cluster countries...

Upvotes: 0

Views: 646

Answers (1)

divibisan
divibisan

Reputation: 12155

It appears that the state names are rownames, instead of a full column. You can convert rownames to a column with:

USArrests <- cbind(rownames(USArrests), USArrests)

or convert a column to rownames:

rownames(df) <- df$states

The tibble package also includes the useful functions: rownames_to_column() and column_to_rownames()

Upvotes: 2

Related Questions