Sanoj
Sanoj

Reputation: 300

advanced reshaping / pivoting in a r dataframe

I am struggling to reshape a dataframe in R. My starting point is a dataframe, which has the following structure:

df_given <- data.frame(
  first_column  = c("NA", "NA", "NA", "Country1", "Country2", "Country3"),
  second_column  = c("Consumption", "real", "2021", 10, 11, 23),
  third_column = c("Consumption", "real", "2022", 20, 22, 12),
  fourth_column = c("Inflation", "expected", "2021", 1, 1.2, 2.5),
  fifth_column = c("Inflation", "expected", "2022", 5, 3, 2)
)

Now my problem is the following: I would like to have the 2021 and 2022 only as two columns, instead of repeating the sequence two times. This, therefore, involves transforming the "description" of this time series (e.g. consumption real and inflation expected) from a row to a column. For this reason, my final target dataframe would look somehow like this:

df_target <- data.frame(
  first_column = c("type", "Country1 Consumption real", "Country2 Consumption real", 
                   "Country3 Consumption real", "Country1 Inflation expected", 
                   "Country2 Inflation expected","Country3 Inflation expected"),
  second_column = c(2021, 10, 11, 23, 1, 1.2, 2.5),
  third_column = c(2022, 20, 22, 12, 5, 3, 2)
)

I assume that pivoting to wider or longer would do the trick. However, my problem is, that I can't really tell if my current dataframe is actually in long or wide format, because I think it is kind of both. Can anyone tell me how to approach this problem? Thanks in advance

Upvotes: 0

Views: 59

Answers (3)

Sanoj
Sanoj

Reputation: 300

Thanks a lot for your input, your solutions were very helpful to find my own solution. For me, the most important take away was to merge all "identifying" rows into the header, which makes all of the following operations a lot easier.

# merge row 1 with 2
df_given[1,] <- paste(df_given[1,], df_given[2,])
df_given = df_given[-c(2),]

# merge "merged row" with daterow with unique seperator into a header
names(df_given) <- as.character(paste(df_given[1,], df_given[2,], sep ="-x-"))
df_given = df_given[-c(1,2),]
names(df_given)[1] <- 'country'

# pivot longer
df_given_new <- df_given %>%
   pivot_longer(!country, names_to = "identifier", values_to = "obs")

# split columns 
df_given_new[c('type', 'year')] <- str_split_fixed(df_given_new$identifier, '-x-', 2)
df_given_new <- subset(df_given_new, select=-c(identifier))

# back to long
dfFinal <- df_given_new %>%
  pivot_wider(names_from = year, values_from = obs)

Upvotes: 0

hello_friend
hello_friend

Reputation: 5788

Easiest way is manual, using Base R:

# Transpose: ir => data.frame
ir <- data.frame(t(df_given))

# Derive metrics: ir2 => character vector
ir2 <- apply(ir[,1:3], 1, paste, collapse = " ")[-1]

# Derive countries: ir3 => character vector
ir3 <- unlist(ir[1,4:ncol(ir), drop = TRUE])

# Derive values: ir4 => data.frame
ir4 <- unlist(ir[2:nrow(ir), 4:ncol(ir)])

# Reshape into long df: ir5 => data.frame
ir5 <- within(
  data.frame(
    cbind(
      stat = ir2, 
      country = rep(ir3, each = length(ir2)),
      val = ir4
    ),
    row.names = NULL
  ),
  {
    year <- substring(stat, nchar(stat)-4)
    stat <- trimws(gsub(paste0(year, collapse = "|"), "", stat))
  }
)

# Pivot: data.frame => stdout(console)
reshape(
  ir5, 
  idvar=c("country", "stat"),
  timevar="year", 
  v.names="val", 
  direction="wide"
)

Upvotes: 1

akshaymoorthy
akshaymoorthy

Reputation: 346

You can use data table, after dropping the extra info in the first couple of rows which aren't really data.

names(df_given) <- c("country","Real C 2021", "Real C 2022", "Inf 2021", "Inf 2022")
df_given <- df_given[-c(1:3),]
library(data.table)
setDT(df_given)
melt(df_given, measure = patterns("^Real C","^Inf"), value.name = c("2021","2022"))

    country variable 2021 2022
1: Country1        1   10    1
2: Country2        1   11  1.2
3: Country3        1   23  2.5
4: Country1        2   20    5
5: Country2        2   22    3
6: Country3        2   12    2

Documentation

Upvotes: 1

Related Questions