I imagine that the same result can be achieved by a proper use of quantile, but I like to have an easy way to obtain summary statistics every n entries of my dataset be it a vector or data.frame.

The function takes three parameters: the R object on which we need to obtain statistics (x), how many entries should each summary contain (step, defaulting to 1000), and the function we want to apply (fun, defaulting to “mean”).

Then, it’s all about using aggregate.

The present version incorporates useful comments by pat and ap53!

summarize.by <- function(x,step=1000,fun="mean")
{
n <- NROW(x)
group<-sort(rep(seq(1,ceiling(n/step)),step)[1:n])
x <- data.frame(group,x)
x <- aggregate(x,by=list(x$group),FUN=fun)
x <- x[,-c(1,2)]
return(x)
}

Example application and result for a data.frame:

dummy<-data.frame(matrix(runif(100000,0,1),ncol=10))
summarize.by(dummy)
          X1        X2        X3        X4        X5        X6        X7
1  0.5081756 0.5206011 0.4972622 0.5060707 0.4907807 0.5063138 0.4982252
2  0.5014300 0.5093051 0.5015310 0.4718058 0.4931249 0.4882382 0.5084970
3  0.4994759 0.4979546 0.4964157 0.5138695 0.5018427 0.5228862 0.4980824
4  0.4970300 0.4953163 0.4954068 0.5157935 0.4770471 0.5000562 0.4960250
5  0.5118221 0.4967686 0.5114420 0.4945936 0.5016019 0.5003544 0.5016693
6  0.5026323 0.4995367 0.5003587 0.4970245 0.4992188 0.4993896 0.4873300
7  0.4911944 0.5081578 0.4858666 0.4974576 0.4864710 0.5022401 0.5058064
8  0.5050684 0.5021456 0.4970707 0.4829222 0.4980984 0.4901941 0.5053296
9  0.4910359 0.4883865 0.4915000 0.4984415 0.4941274 0.4933778 0.4964306
10 0.4832396 0.4986647 0.5017873 0.5008766 0.4952849 0.5036030 0.5084799
          X8        X9       X10
1  0.5052379 0.4906292 0.4916262
2  0.5074966 0.5117570 0.5183119
3  0.4988349 0.5029704 0.5077726
4  0.4889516 0.5066026 0.5078195
5  0.5068717 0.4988389 0.5018225
6  0.5010366 0.4870614 0.4827767
7  0.5148197 0.5083662 0.5037901
8  0.4979452 0.5273463 0.4944513
9  0.5130718 0.5061075 0.5058208
10 0.4896030 0.4911127 0.4956848

And for a vector


dummy<-runif(10000,0,1)
summarize.by(dummy)
 [1] 0.4914789 0.4908839 0.4951939 0.4928015 0.4911908 0.4994735 0.4947729
 [8] 0.5058204 0.5026956 0.5018375
About these ads