Amleto
Amleto

Reputation: 584

Summarize with character type conditions in dplyr

I would like to count the number of times a country is listed alone and the times is listed with some other country.

This is a section of MY DATASET:

address_countries2
name_countries      n_countries
China               1                      
China               1
Usa                 1                        
Usa                 1
China France        2               
China France        2
India               1                      
India               1
Jordan Germany      2             

I have used the following code to extract the number of times each country appears.

publication_countries <- address_countries2 %>% 
  select(name_countries, n_countries) %>% 
  unnest_tokens(word, name_countries) %>%
  group_by(word) %>% 
  summarise(TP = n())

 head(publication_countries)
 # A tibble: 6 x 2
    word          TP
    <chr>       <int>
   1 China         4
   2 Usa           2
   3 France        2
   4 India         2
   5 Jordan        1       
   6 Germany       1

I would like to create a new column with the number of rows a country is listed on its own, as well as a second column with the number of times a country is listed with other countries.

DESIRED OUTPUT Something like this:

 head(publication_countries)
 # A tibble: 6 x 2
    word          TP      single_times      with_other_countries
    <chr>       <int>            <int>                     <int>   
   1 China         4                2                         2
   2 Usa           2                2                         0
   3 France        2                0                         2
   4 India         2                2                         0
   5 Jordan        1                0                         1
   6 Germany       1                0                         1

From this link I have seen a possible way to summarise with condition, however, in my case I would need to use something different than sum(), as my conditional object is in form of character (column word).

summarise(TP = n() , IP = count(word[n_countries=="1"])) 

But I get this error:

Error in summarise_impl(.data, dots) : 
  Evaluation error: no applicable method for 'groups' applied to an object of    class "character"

Please any help would be appreciated :)

Many thanks

Upvotes: 0

Views: 3053

Answers (1)

Onyambu
Onyambu

Reputation: 79288

dat%>% 
   select(name_countries, n_countries) %>% 
   unnest_tokens(word, name_countries) %>%
   group_by(word)%>%mutate(TP=n())%>%
   group_by(n_countries,word)%>%mutate(Tp1=n())%>%
   unique()%>%spread(n_countries,Tp1,0)
# A tibble: 6 x 4
# Groups:   word [6]
     word    TP   `1`   `2`
*   <chr> <int> <dbl> <dbl>
1   china     4     2     2
2  france     2     0     2
3 germany     1     0     1
4   india     2     2     0
5  jordan     1     0     1
6     usa     2     2     0

Upvotes: 2

Related Questions