Reputation: 33
I'm very new to R, still learning the very basics, and I haven't yet figured out how to perform this particular operation, but it would save me lots and lots of labor and time.
I have a dataset of international conflicts with columns for country and dates that looks something like this:
country dates
Angola 1951-1953
Belize 1970-1972
I would like to reorganize the data to create variables for start year and end year, as well as create a year-observed (call it 'yrobs') column, so the set looks more like this:
country yrobs yrstart yrend
Angola 1951 1951 1953
Angola 1952 1951 1953
Angola 1953 1951 1953
Belize 1970 1970 1972
Belize 1971 1970 1972
Belize 1972 1970 1972
Someone suggested using data frames and a double for-loop, but I got a little confused trying that. Any help would be greatly appreciated, and feel free to use dummy language, as I'm still pretty green to the programming here. Thanks much.
Upvotes: 3
Views: 1986
Reputation: 179398
No need for any for loops here. Use the power of R and its contributed packages, particularly plyr and reshape2.
library(reshape2)
library(plyr)
Create some data:
df <- data.frame(
country =c("Angola","Belize"),
dates = c("1951-1953", "1970-1972")
)
Use colsplit in the reshape package to split your dates column into two, and cbind this to the original data frame.
df <- cbind(df, colsplit(df$date, "-", c("start", "end")))
Now for the fun bit. Use ddply in package plyr to split, apply and combine (SAC). This will take df and apply a function to each change in country. The anonymous function inside ddply creates a small data.frame with country and observations, and the key bit is to use seq() to generate a sequence from start to end date. The power of ddply is that it does all of this splitting, combining and applying in one step. Think of it as a loop in other languages, but you don't need to keep track of your indexing variables.
ddply(df, .(country), function(x){
data.frame(
country=x$country,
yrobs=seq(x$start, x$end),
yrstart=x$start,
yrend=x$end
)
}
)
And the results:
country yrobs yrstart yrend
1 Angola 1951 1951 1953
2 Angola 1952 1951 1953
3 Angola 1953 1951 1953
4 Belize 1970 1970 1972
5 Belize 1971 1970 1972
6 Belize 1972 1970 1972
Upvotes: 9