Gaspar
Gaspar

Reputation: 145

Scatter plot with variables that have multiple different years

I'm currently trying to make a scatter plot of child mortality rate and child labor. My problem is, I don't actually have a lot of data, and some countries may only get values for some years, and some other countries may only have data for some other years, so I can't plot all the data together, nor the data in any year is big enough to limit to that only year. I was wondering if there is a function that takes the last value available in the dataset for any given specified variable. So, for instance, if my last data for child labor from Germany is from 2015 and my last data from Italy is from 2014, and so forth with the rest of the countries, is there a way I can plot the last values for each country?

Code goes like this:

head(data2)
# A tibble: 6 x 5
  Entity      Code   Year mortality labor
  <chr>       <chr> <dbl>     <dbl> <dbl>
1 Afghanistan AFG    1962      34.5    NA
2 Afghanistan AFG    1963      33.9    NA
3 Afghanistan AFG    1964      33.3    NA
4 Afghanistan AFG    1965      32.8    NA
5 Afghanistan AFG    1966      32.2    NA
6 Afghanistan AFG    1967      31.7    NA

Never mind about those NA's. Labor data just doesn't go back there. But I do have it in the dataset, for more recent years. Child mortality data, on the other hand, is actually pretty complete.

Thanks.

Upvotes: 0

Views: 536

Answers (2)

hugh-allan
hugh-allan

Reputation: 1370

You might want to define what you mean by 'last' value per group - as in most recent, last occurrence in the data or something else?

dplyr::last picks out the last occurrence in the data, so you could use it along with arrange to order your data. In this example we sort the data by Year (ascending order by default), so the last observation will be the most recent. Assuming you don't want to include NA values, we also use filter to remove them from the data.

data2 %>%

   # first remove NAs from the data
   filter(
      !is.na(labor)
   ) %>%

   # then sort the data by Year
   arrange(Year) %>%

   # then extract the last observation per country
   group_by(Entity) %>%
   summarise(
      last_record = last(labor)
   )

Upvotes: 0

Kra.P
Kra.P

Reputation: 15143

I cannot find which variable to plot, but following code can select only last of each country.

    data2 %>%
      group_by(Entity) %>%
      filter(Year == max(Year)) %>% 
      ungroup

result is like

      Entity      Code   Year mortality labor
      <chr>       <chr> <dbl>     <dbl> <lgl>
    1 Afghanistan AFG    1967      31.7 NA  

No you can plot some variable.

Upvotes: 1

Related Questions