user5802890
user5802890

Reputation:

Plot non-numeric data

How do we plot a non-numeric data in R? I want to plot (using any graph type - for example, boxplot or histogram ... etc) aa against bb . I want to have bb on my x-axis and aa on my y-axis.

class(aa)
# [1] "character"
class(bb)
# [1] "character"

Upvotes: 2

Views: 27792

Answers (1)

steveb
steveb

Reputation: 5532

You can use dplyr and ggplot for this.

Assuming the input you provided is in df (see bottom of this post for input data)

library(dplyr)
library(ggplot)

## Assuming the data is in the file 'Types.csv'
df <- read.csv('Types.csv')

df_summary <-
    df                          %>% # Pipe df into group_by
    group_by(type)              %>% # grouping by 'type' column
    summarise(name_count = n())     # calculate the name count for each group
## 'df_summary' now contains the summary data for each 'type'
df_summary

##     type name_count
##    (chr)      (int)
##1     dos          6
##2  normal          1
##3   probe          4
##4     r2l          8
##5     u2r          4
##6 unknown          1

### Two ways to plot using ggplot

## (1) Plot pre summarized data: 'df_summary'.
ggplot(df_summary, aes(type, name_count)) +  # 
    geom_bar(stat = 'identity')              # stat='identity' is used for summarized data.

## (2) Bar plot on original data frame (not summarised)
ggplot(df, aes(type))      +
    geom_bar()             + # 'stat' isn't needed here.
    labs(y = 'name_count')

Here is the plot of df_summary

enter image description here

You can also do the following to add labels and a plot title (plot results not shown for this

ggplot(df, aes(type)) +
    geom_bar() +
    labs(x = 'Type', y = 'Count') +
    ggtitle('Type Counts')

To add text labels just above the bars (in this case, of the frequencies of each category), adding geom_text can be used as below (plot results not shown).

ggplot(df_summary, aes(type, name_count)) +
    geom_bar(stat = 'identity') +
    geom_text(aes(label = name_count), vjust = -1) +
    ggtitle('Type Counts')

## OR

ggplot(df, aes(type)) +
    geom_bar() +
    labs(x = 'Type', y = 'Count') +
    geom_text(stat = 'count', aes(label = ..count..), vjust = -1) +
    ggtitle('Type Counts')

Input data

df <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
         name    type
           back     dos
buffer_overflow     u2r
      ftp_write     r2l
   guess_passwd     r2l
           imap     r2l
        ipsweep   probe
           land     dos
     loadmodule     u2r
       multihop     r2l
        neptune     dos
           nmap   probe
           perl     u2r
            phf     r2l
            pod     dos
      portsweep   probe
        rootkit     u2r
          satan   probe
          smurf     dos
            spy     r2l
       teardrop     dos
    warezclient     r2l
    warezmaster     r2l
         normal  normal
        unknown unknown')

Upvotes: 2

Related Questions