kimmyjo221
kimmyjo221

Reputation: 715

Split data in R and perform operation

I have a very large file that simply contains wave heights for different tidal scenarios at different locations. My file is organized into 13 wave heights x 9941 events, for 5153 locations.

What I want to do is read in this very long data file, which looks like this:

0.0
0.1
0.2
0.4
1.2
1.5
2.1 

.....

Then split it into segments of length 129,233 (corresponds to 13 tidal scenarios for 9941 events at a specific location). On this subset of the data I'd like to perform some statistical functions to calculate exceedance probability, among other things. I will then join it to the file containing location information, and print some output files.

My code so far is not working, although I've tried many things. It seems to read the data just fine, however it is having trouble with the split. I suspect it may have something to do with the format of the input data from the file.

# read files with return period wave heights at defense points

#Read wave heights for 13 tides per 9941 events, for 5143 points
WaveRP.file <- paste('waveheight_test.out')
WaveRPtable <- read.csv(WaveRP.file, head=FALSE) 

WaveRP <- c(WaveRPtable)

#colnames(WaveRP) <- c("WaveHeight")

print(paste(WaveRP))

#Read X,Y information for defense points
DefPT.file <- paste('DefXYevery10thpt.out')
DefPT <- read.table(DefPT.file, head=FALSE)

colnames(DefPT) <- c("X_UTM", "Y_UTM")

#Split wave height data frame by defense point
WaveByDefPt <- split(WaveRP, 129233)

print(paste(length(WaveByDefPt[[1]])))

for (i in 1:length(WaveByDefPt)/129233){
        print(paste("i",i))
}

I have also tried

#Split wave height data frame by defense point
WaveByDefPt <- split(WaveRP, ceiling(seq_along(WaveRP)/129233))

No matter how I seem to perform the split, I am simply getting the original data as one long subset. Any help would be appreciated!

Thanks :) Kimberly

Upvotes: 0

Views: 135

Answers (2)

farnsy
farnsy

Reputation: 2470

You are kind of shuffling the data into various data types here.

When the file is originally read, it is a dataframe with 1 column (V1). Then you pass it to c(), which results in a list with a single vector in it. This means if you try and do anything to WaveRP you will probably fail because that's the name of the list. The numeric vector is WaveRP[[1]].

Instead, just extract the numeric vector using the $ operator and then you can work with it. Or just work with it inside the data frame. The fun part will be thinking of a way to create the grouping vector. I'll give an example.

Something like this:

 WaveRP.file <- paste('waveheight_test.out')
 WaveRPtable <- read.csv(WaveRP.file, head=FALSE)
 WaveRPtable$group <- ceiling(seq_along(WaveRPtable$V1)/129233)
 SplitWave <- split(WveRPtable,WaveRPtable$group)

Now you will have a list containing 13 dataframes. Look at each one using double bracket indexing. SplitWave[[2]], for example, to look at the second group. You can merge the location information file with these dataframes individually.

Upvotes: 1

lukeA
lukeA

Reputation: 54237

Try cut to build groups:

v <- as.numeric(readLines(n = 7))
0.0
0.1
0.2
0.4
1.2
1.5
2.1 
groups <- cut(v, breaks = 3) # you want breaks = 129233
aggregate(x = v, by = list(groups), FUN = mean) # e.g. means per group
#           Group.1     x
# 1 (-0.0021,0.699] 0.175
# 2     (0.699,1.4] 1.200
# 3       (1.4,2.1] 1.800

Upvotes: 1

Related Questions