user3490233
user3490233

Reputation: 41

Manipulating data in R from columns to rows

I have data that is currently organized as follows:

 X.1 State MN    X.2    WI    X.3
     NA    Price Pounds Price Pounds
Year NA    
1980 NA    56    23     56    96
1999 NA    41    63     56    65

I would like to convert it to something more like this:

Year State Price Pounds
1980 MN    56    23
1999 MN    41    63
1980 WI    56    96
1999 WI    56    65

Any suggestions for some R-code to manipulate this data correctly? Thanks!

Upvotes: 2

Views: 84

Answers (2)

BrodieG
BrodieG

Reputation: 52637

This requires some manipulation to get it into a format that you can reshape.

df <- read.table(h=T, t=" X.1 State MN    X.2    WI    X.3
NA     NA    Price Pounds Price Pounds
Year NA    NA    NA     NA    NA
1980 NA    56    23     56    96
1999 NA    41    63     56    65")

df <- df[-2]

# Auto-process names; you should look at intermediate step results to see
# what's going on.  This would probably be better addressed with something
# like `na.locf` from `zoo` but this is all in base.  Note you can do something
# a fair bit simpler if you know you have the same number of items for each
# state, but this should be robust to different numbers.

df.names <- names(df)
df.names <- ifelse(grepl("X.[0-9]+", df.names), NA, df.names)
df.names[[1]] <- "Year"
df.names.valid <- Filter(Negate(is.na), df.names)
df.names[is.na(df.names)] <- df.names.valid[cumsum(!is.na(df.names))[is.na(df.names)]]
names(df) <- df.names

# rename again by adding Price/Pounds

names(df)[-1] <- paste(                                
  vapply(2:5, function(x) as.character(df[1, x]), ""), # need to do this because we're pulling across different factor columns
  names(df)[-1], 
  sep="."
)
df <- df[-(1:2),]   # Don't need rows 1:2 anymore
df

Produces:

  Year Price.MN Pounds.MN Price.WI Pounds.WI
3 1980       56        23       56        96
4 1999       41        63       56        65

Then:

using base reshape:

reshape(df, direction="long", varying=2:5)

Which gets you basically where you want to be:

     Year time Price Pounds id
1.MN 1980   MN    56     23  1
2.MN 1999   MN    41     63  2
1.WI 1980   WI    56     96  1
2.WI 1999   WI    56     65  2

Clearly you'll want to rename some columns, etc., but that's straightforward. The key point with reshape is that the column names matter so we constructed them in a way that reshape can use.

using reshape2::melt/cast:

library(reshape2)
df.mlt <- melt(df, id.vars="Year")
df.mlt <- transform(df.mlt, 
  metric=sub("\\..*", "", variable), 
  state=sub(".*\\.", "", variable)
)
dcast(df.mlt[-2], Year + state ~ metric)

produces:

  Year state Pounds Price
1 1980    MN     23    56
2 1980    WI     96    56
3 1999    MN     63    41
4 1999    WI     65    56

BE VERY CAREFUL, it is likely that Price and Pounds are factors because the column used to have both character and numeric values. You will need to convert to numeric with as.numeric(as.character(df$Price)).

Upvotes: 1

Rich Scriven
Rich Scriven

Reputation: 99321

Well that was a nice challenge. It's a lot of strsplits and greps, and it may not generalize to your entire data set. Or maybe it will, you never know.

> txt <- "X.1 State MN    X.2    WI    X.3
  NA    Price Pounds Price Pounds
  Year NA
  1980 NA    56    23     56    96
  1999 NA    41    63     56    65"
> 
> x <- textConnection(txt)
> y <- gsub("((X[.][0-9]{1})|NA)|\\s+", " ", readLines(x))
> z <- unlist(strsplit(y, "^\\s+"))
> a <- z[nzchar(z)]
> b <- unlist(strsplit(a, "\\s+"))
> nums <- as.numeric(grep("[0-9]", b[nchar(b) == 2], value = TRUE))
> Price = rev(nums[c(TRUE, FALSE)])
> pounds <- nums[-which(nums %in% Price)]
> data.frame(Year = rep(b[grepl("[0-9]{4}", b)], 2),
             State = unlist(lapply(b[grepl("[A-Z]{2}", b)], rep, 2)),
             Price = Price,
             Pounds = c(pounds[1], rev(pounds[2:3]), pounds[4]))
##   Year State Price Pounds
## 1 1980    MN    56     23
## 2 1999    MN    41     63
## 3 1980    WI    56     96
## 4 1999    WI    56     65

Upvotes: 0

Related Questions