Reputation: 447
I have a table with data from an stream gauge arranged as this:
Water.Year May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr
1 1953-1954 55.55 43.62 30.46 26.17 26.76 41.74 19.92 41.25 28.77 20.96 12.47 10.51
2 1954-1955 23.49 81.35 46.71 29.33 67.83 133.30 37.62 30.16 21.07 19.38 13.87 10.63
3 1955-1956 9.87 51.59 55.36 63.03 154.08 98.15 104.06 32.85 22.89 17.30 15.68 10.88
> data <- structure(list(Water.Year = structure(1:6, .Label = c("1953-1954", "1954-1955", "1955-1956", "1956-1957", "1957-1958", "1958-1959", "1959-1960", "1960-1961", "1961-1962", "1962-1963", "1963-1964", "1964-1965", "1965-1966", "1966-1967", "1967-1968", "1968-1969", "1969-1970", "1970-1971", "1971-1972", "1972-1973", "1973-1974", "1974-1975", "1975-1976", "1976-1977", "1977-1978", "1978-1979", "1979-1980", "1980-1981", "1981-1982", "1982-1983", "1983-1984", "1984-1985", "1985-1986", "1986-1987", "1987-1988", "1988-1989", "1989-1990", "1990-1991", "1991-1992", "1992-1993", "1993-1994", "1994-1995", "1995-1996", "1996-1997", "1997-1998", "1998-1999", "1999-2000", "2000-2001"), class = "factor"), May = c(55.55, 23.49, 9.87, 18.03, 17.46, 11.37), Jun = c(43.62, 81.35, 51.59, 28.61, 15.14, 29.48), Jul = c(30.46, 46.71, 55.36, 24.36, 20.09, 19.48), Ago = c(26.17, 29.33, 63.03, 22.01, 16.97, 16.86), Set = c(26.76, 67.83, 154.08, 28.51, 27.24, 21.01), Oct = c(41.74, 133.3, 98.15, 53.72, 35.78, 19.78), Nov = c(19.92, 37.62, 104.06, 115.78, 20.35, 18.69), Dic = c(41.25, 30.16, 32.85, 32.04, 22, 18.86), Ene = c(28.77, 21.07, 22.89, 25.44, 13.27, 14.89), Feb = c(20.96, 19.38, 17.3, 14.53, 10.37, 10.4), Mar = c(12.47, 13.87, 15.68, 10.78, 8.77, 8.79), Abr = c(10.51, 10.63, 10.88, 9.33, 7.69, 8.99)), .Names = c("Water.Year", "May", "Jun", "Jul", "Ago", "Set", "Oct", "Nov", "Dic", "Ene", "Feb", "Mar", "Abr"), row.names = c(NA, 6L), class = "data.frame")
It is arranged by "water years", where each year starts in May and ends in April of the next year (this can be seen in the first column). I want to convert it to a dataframe with three columns: Calendar.Year -- Month -- Flow.Measurement
I already broke down the Water.Year column into two columns using "separate" from tidyr:
> df = separate(data, Water.Year, c("year1","year2"))
year1 year2 May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr
1 1953 1954 55.55 43.62 30.46 26.17 26.76 41.74 19.92 41.25 28.77 20.96 12.47 10.51
2 1954 1955 23.49 81.35 46.71 29.33 67.83 133.30 37.62 30.16 21.07 19.38 13.87 10.63
Now I'm planning to use "gather" from tidyr to do the rest of the transformation but I'm stuck at how I can create a Calendar.Year column using year1 for columns May to Dec and year2 for Jan to Apr.
Any help will be appreciated.
Upvotes: 2
Views: 263
Reputation: 447
I decided to use some pieces of all the answers I got. This is the code I wrote:
library(dplyr)
library(tidyr)
#separate the year column into two years
df_years <- df %>%
separate(Water.Year, c("Year1", "Year2"))
#create two different dataframes for each section of the year
df1 <- subset(df_years, select = c(Year1, May:Dec))
df2 <- subset(df_years, select = c(Year2, Jan:Apr))
#rename both years' columns using the same name
colnames(df2)[1] <- "Year"
colnames(df1)[1] <- "Year"
#join both dataframes
cleandata <- full_join(df1, df2, by = "Year")
#sort months chronologically
cleandata <- cleandata[, c("Year", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")]
#convert to tidy data set
cleandata <- gather(cleandata, "Month", "Flow", 2:13)
#sort by year and month
cleandata <- arrange(cleandata, Year, Month)
Upvotes: 0
Reputation: 21621
Another idea (using @useR data with English months)
library(dplyr)
library(tidyr)
df %>%
separate(Water.Year, c("Year1", "Year2")) %>%
gather(Month, Value, -(Year1:Year2)) %>%
group_by(Year1, Year2) %>%
mutate(Year = if_else(match(Month, month.abb) >= 5, Year1, Year2),
Month = factor(Month, levels = month.abb)) %>%
ungroup() %>%
select(Year, Month, Value) %>%
arrange(Year, Month)
We separate the Water.Year
column into Year1
and Year2
and we reshape the data into long format using gather()
. Then, for each group, we use match()
and month.abb
to check if the month is greater or equal to 5 (May) and assign the corresponding year with if_else()
. Finally, we remove unncessary columns and arrange()
by Year
and Month
## A tibble: 36 × 3
# Year Month Value
# <chr> <fctr> <dbl>
#1 1953 May 55.55
#2 1953 Jun 43.62
#3 1953 Jul 30.46
#4 1953 Aug 26.17
#5 1953 Sep 26.76
#6 1953 Oct 41.74
#7 1953 Nov 19.92
#8 1953 Dec 41.25
#9 1954 Jan 28.77
#10 1954 Feb 20.96
## ... with 26 more rows
Upvotes: 3
Reputation: 18681
Ok, I just realized the months you provided in structure()
may be in a different language. I will just stick with the data I created, which uses the English version of Months. This way people can see the corresponding solution in English.
library(tidyr) # for separate function
library(reshape2) # for melt function
library(dplyr) # for pipe operator and arrange function
# Reproducible Data
weather = structure(list(Water.Year = structure(1:3, .Label = c("1953-1954",
"1954-1955", "1955-1956"), class = "factor"),
May = c(55.55, 23.49, 9.87),
Jun = c(43.62, 81.35, 51.59),
Jul = c(30.46, 46.71, 55.36),
Aug = c(26.17, 29.33, 63.03),
Sep = c(26.76, 67.83, 154.08),
Oct = c(41.74, 133.3, 98.15),
Nov = c(19.92, 37.62, 104.06),
Dec = c(41.25, 30.16, 32.85),
Jan = c(28.77, 21.07, 22.89),
Feb = c(20.96, 19.38, 17.3), Mar = c(12.47, 13.87, 15.68),
Apr = c(10.51, 10.63, 10.88)), .Names = c("Water.Year", "May",
"Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan", "Feb",
"Mar", "Apr"), class = "data.frame", row.names = c(NA, -3L))
Coding starts here:
df = separate(weather, Water.Year, c("year1","year2"))
# Split into two datasets
columns1 = c("year1", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Dec")
df1 = subset(df, select = c(year1, May:Dec))
df2 = subset(df, select = c(year2, Jan:Apr))
longdf1 = melt(df1, variable.name = "Month", id.vars = "year1",
value.name = "Flow.Measurement")
names(longdf1)[1] = "Calendar.Year"
longdf2 = melt(df2, variable.name = "Month", id.vars = "year2",
value.name = "Flow.Measurement")
names(longdf2)[1] = "Calendar.Year"
# Combine the two datasets
final_df = rbind(longdf1, longdf2)
# Releveling the Month
final_df$Month = factor(final_df$Month, levels = month.abb)
final_df = arrange(final_df, Calendar.Year, Month)
Final dataframe:
> final_df
Calendar.Year Month Flow.Measurement
1 1953 May 55.55
2 1953 Jun 43.62
3 1953 Jul 30.46
4 1953 Aug 26.17
5 1953 Sep 26.76
6 1953 Oct 41.74
7 1953 Nov 19.92
8 1953 Dec 41.25
9 1954 Jan 28.77
10 1954 Feb 20.96
11 1954 Mar 12.47
12 1954 Apr 10.51
13 1954 May 23.49
14 1954 Jun 81.35
15 1954 Jul 46.71
16 1954 Aug 29.33
17 1954 Sep 67.83
18 1954 Oct 133.30
19 1954 Nov 37.62
20 1954 Dec 30.16
21 1955 Jan 21.07
22 1955 Feb 19.38
23 1955 Mar 13.87
24 1955 Apr 10.63
25 1955 May 9.87
26 1955 Jun 51.59
27 1955 Jul 55.36
28 1955 Aug 63.03
29 1955 Sep 154.08
30 1955 Oct 98.15
31 1955 Nov 104.06
32 1955 Dec 32.85
33 1956 Jan 22.89
34 1956 Feb 17.30
35 1956 Mar 15.68
36 1956 Apr 10.88
Upvotes: 1
Reputation: 135
Ok how about this. It's a mashup between reshape and base R.
I used your dataset once you posted it. Thanks for providing it.
data <- structure(list(Water.Year = structure(1:6, .Label = c("1953-1954", "1954-1955", "1955-1956", "1956-1957", "1957-1958", "1958-1959", "1959-1960", "1960-1961", "1961-1962", "1962-1963", "1963-1964", "1964-1965", "1965-1966", "1966-1967", "1967-1968", "1968-1969", "1969-1970", "1970-1971", "1971-1972", "1972-1973", "1973-1974", "1974-1975", "1975-1976", "1976-1977", "1977-1978", "1978-1979", "1979-1980", "1980-1981", "1981-1982", "1982-1983", "1983-1984", "1984-1985", "1985-1986", "1986-1987", "1987-1988", "1988-1989", "1989-1990", "1990-1991", "1991-1992", "1992-1993", "1993-1994", "1994-1995", "1995-1996", "1996-1997", "1997-1998", "1998-1999", "1999-2000", "2000-2001"), class = "factor"), May = c(55.55, 23.49, 9.87, 18.03, 17.46, 11.37), Jun = c(43.62, 81.35, 51.59, 28.61, 15.14, 29.48), Jul = c(30.46, 46.71, 55.36, 24.36, 20.09, 19.48), Ago = c(26.17, 29.33, 63.03, 22.01, 16.97, 16.86), Set = c(26.76, 67.83, 154.08, 28.51, 27.24, 21.01), Oct = c(41.74, 133.3, 98.15, 53.72, 35.78, 19.78), Nov = c(19.92, 37.62, 104.06, 115.78, 20.35, 18.69), Dic = c(41.25, 30.16, 32.85, 32.04, 22, 18.86), Ene = c(28.77, 21.07, 22.89, 25.44, 13.27, 14.89), Feb = c(20.96, 19.38, 17.3, 14.53, 10.37, 10.4), Mar = c(12.47, 13.87, 15.68, 10.78, 8.77, 8.79), Abr = c(10.51, 10.63, 10.88, 9.33, 7.69, 8.99)), .Names = c("Water.Year", "May", "Jun", "Jul", "Ago", "Set", "Oct", "Nov", "Dic", "Ene", "Feb", "Mar", "Abr"), row.names = c(NA, 6L), class = "data.frame")
I decided to use the year information you had from before and just add in calendar year based on that. Since we know May-Dec is Year 1, and Jan-Apr is Year 2. Maybe a bit convoluted but it gets the job done.
df = separate(data, Water.Year, c("year1","year2"))
library(reshape2)
fixDF<-melt(df)
fixDF$CalendarYear<-rep(NA,nrow(fixDF))
fixDF$CalendarYear[min(which(fixDF$variable=="May")):max(which(fixDF$variable=="Dic"))]<-df$year1
fixDF$CalendarYear[min(which(fixDF$variable=="Ene")):max(which(fixDF$variable=="Abr"))]<-df$year2
fixDF<-fixDF[,3:5]
colnames(fixDF)<-c("Month","Flow.Measurement", "Calendar.Year")
Upvotes: 1