Reputation: 691
I am converting some SAS code into R and am stuck on a portion to reshape the final data output into a new format. I have a data frame, df, that looks like this:
State AREA Year species ncount_ip est.ip se.ip est.tib se.tib
1 CT 12593 2015 ABDU 56 1349.250 943.2464 4497.50 2871.4829
2 CT 12593 2015 GADW 56 224.875 224.3744 6746.25 6290.3472
3 CT 12593 2015 COME 56 0.000 0.0000 0.00 0.0000
4 VT 12593 2015 ABDU 56 8545.250 1756.8546 19114.38 5443.0618
5 VT 12593 2015 COME 56 674.625 498.0543 1349.25 996.1086
6 VT 12593 2015 GADW 56 224.875 224.3744 449.75 448.7489
That I want to put in a format like this (final):
Species Type Year VTest VTse CTest CTse
GADW Pop 2015 449.75 448.7489 6746.25 6290.3472
GADW Pairs 2015 224.875 224.3744 224.875 224.3744
ABDU Pop 2015 19114.38 5443.0618 4497.50 2871.4829
ABDU Pairs 2015 8545.250 1756.8546 1349.250 943.2464
COME Pop 2015 1349.25 996.1086 0.00 0.00
COME Pairs 2015 674.625 498.0543 0.00 0.00
Basically, I need to take from df the estimate (est) and standard error (se) of pop. (.tib) and pairs (.ip) for each state (only 2 given in example but about 10 in actual dataset) to form 2 rows for each species with 2 columns for each state to create final.
I started by trying package reshape and melting but don't quite get what I need. I think renaming by state after melting might work, but can't code that up appropriately. Thanks for your time and help.
Upvotes: 0
Views: 61
Reputation: 1202
Starting with your df
, and using the reshape2
package, first melt the dataframe, preserving the first 5 columns:
> library(reshape2)
> melted <- melt(df, id.vars=1:5)
> head(melted)
State AREA Year species ncount_ip variable value
1 CT 12593 2015 ABDU 56 est.ip 1349.250
2 CT 12593 2015 GADW 56 est.ip 224.875
3 CT 12593 2015 COME 56 est.ip 0.000
4 VT 12593 2015 ABDU 56 est.ip 8545.250
5 VT 12593 2015 COME 56 est.ip 674.625
6 VT 12593 2015 GADW 56 est.ip 224.875
Use colsplit
to pull out the statistic and type identifiers from the new variable
column (old headers), and add these columns to dataframe:
> melted <- cbind(melted,
+ colsplit(melted$variable, "\\.", c('stat','type')))
> head(melted)
State AREA Year species ncount_ip variable value stat type
1 CT 12593 2015 ABDU 56 est.ip 1349.250 est ip
2 CT 12593 2015 GADW 56 est.ip 224.875 est ip
3 CT 12593 2015 COME 56 est.ip 0.000 est ip
4 VT 12593 2015 ABDU 56 est.ip 8545.250 est ip
5 VT 12593 2015 COME 56 est.ip 674.625 est ip
6 VT 12593 2015 GADW 56 est.ip 224.875 est ip
Combine State
and stat
to achieve your desired labels, and replace strings in type
:
> melted$state_stat <- paste(melted$State, melted$stat, sep ="_")
> melted$type <- gsub("tib", "Pop", melted$type)
> melted$type <- gsub("ip", "Pairs", melted$type)
> head(melted)
State AREA Year species ncount_ip variable value stat type state_stat
1 CT 12593 2015 ABDU 56 est.ip 1349.250 est Pairs CT_est
2 CT 12593 2015 GADW 56 est.ip 224.875 est Pairs CT_est
3 CT 12593 2015 COME 56 est.ip 0.000 est Pairs CT_est
4 VT 12593 2015 ABDU 56 est.ip 8545.250 est Pairs VT_est
5 VT 12593 2015 COME 56 est.ip 674.625 est Pairs VT_est
6 VT 12593 2015 GADW 56 est.ip 224.875 est Pairs VT_est
Recast the dataframe, using new state_stat
column for columns:
> final <- dcast(melted, Year+species+type~state_stat, value.var="value")
> final
Year species type CT_est CT_se VT_est VT_se
1 2015 ABDU Pairs 1349.250 943.2464 8545.250 1756.8546
2 2015 ABDU Pop 4497.500 2871.4829 19114.380 5443.0618
3 2015 COME Pairs 0.000 0.0000 674.625 498.0543
4 2015 COME Pop 0.000 0.0000 1349.250 996.1086
5 2015 GADW Pairs 224.875 224.3744 224.875 224.3744
6 2015 GADW Pop 6746.250 6290.3472 449.750 448.7489
Upvotes: 1