Reputation: 49
Say I have a dataset like so:
state address
Business - AK - Fairbanks 1234 wherever street
Business - AK - Juneau 2234 wherever street
Business - AK - Anchorage 3234 wherever street
Business - AL - Montgomery 4234 wherever street
Business - AL - Birmingham 5234 wherever street
Business - AL - Birmingham 1234 wherever street
Business - AL - Birmingham 7234 wherever street
Business - AL - Greenville 6234 wherever street
Business - AZ - Parker 7234 wherever street
Business - AZ - Parker 5234 wherever street
Business - AZ - Phoenix 8234 wherever street
Business - AZ - Tucson 9234 wherever street
And so on and so forth, with variable state entries and etc, you get the idea. I want it to count the number of times each state appears in the 'state' column, regardless of city. Ideally, it ends up looking like so:
state total
AK 3
AL 5
AZ 4
So on and so forth, with the count being different for each state based on the entries, obviously. I'm having difficulty figuring out how, exactly, to make it so that it counts each state entry by its 2-letter code, though. I'm using dplyr
and tidyverse
, if it matters.
Upvotes: 0
Views: 58
Reputation: 21440
table(gsub(".*([A-Z]{2}).*", "\\1", df$state))
or, based on stringr
s function str_extract
and positive lookbehind and lookahead:
table(str_extract(df$state, "(?<=- )[A-Z]{2}(?= -)"))
With dplyr
:
df %>% mutate(state = str_extract(state, "(?<=- )[A-Z]{2}(?= -)")) %>% count(state)
Upvotes: 1
Reputation: 389255
You can get the state and use table
to count the frequency.
Use stack
if you want output as dataframe.
stack(table(sub('-.*', '', df$state)))[2:1]
# ind values
#1 AK 3
#2 AL 5
#3 AZ 4
Using dplyr
:
library(dplyr)
df %>% mutate(state = sub('-.*', '', state)) %>% count(state)
Upvotes: 0