Reputation: 6367
So I am trying to get total size of a directory using Go. So far I have this:
var dirSize int64 = 0
func readSize(path string, file os.FileInfo, err error) error {
if !file.IsDir() {
dirSize += file.Size()
}
return nil
}
func DirSizeMB(path string) float64 {
dirSize = 0
filepath.Walk(path, readSize)
sizeMB := float64(dirSize) / 1024.0 / 1024.0
sizeMB = Round(sizeMB, .5, 2)
return sizeMB
}
The question is whether the dirSize
global variable is going to cause problems and if it does, how do I move it to the scope of the DirSizeMB
function?
Upvotes: 21
Views: 18390
Reputation: 670
Based on the answer from @Dave C, this is an improved function for better performance. I would say this is 2x faster.
func DirSize(path string) (int64, error) {
var size int64
var mu sync.Mutex
// Function to calculate size for a given path
var calculateSize func(string) error
calculateSize = func(p string) error {
fileInfo, err := os.Lstat(p)
if err != nil {
return err
}
// Skip symbolic links to avoid counting them multiple times
if fileInfo.Mode()&os.ModeSymlink != 0 {
return nil
}
if fileInfo.IsDir() {
entries, err := os.ReadDir(p)
if err != nil {
return err
}
for _, entry := range entries {
if err := calculateSize(filepath.Join(p, entry.Name())); err != nil {
return err
}
}
} else {
mu.Lock()
size += fileInfo.Size()
mu.Unlock()
}
return nil
}
// Start calculation from the root path
if err := calculateSize(path); err != nil {
return 0, err
}
return size, nil
}
Upvotes: 0
Reputation: 7878
Using a global like that at best is bad practice.
It's also a race if DirSizeMB
is called concurrently.
The simple solution is to use a closure, e.g.:
func DirSize(path string) (int64, error) {
var size int64
err := filepath.Walk(path, func(_ string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if !info.IsDir() {
size += info.Size()
}
return err
})
return size, err
}
You could assign the closure to a variable if you think that looks better.
Upvotes: 43
Reputation: 2177
One thing you could do would be to define a channel inside of DirSizeMB
, and define readSize
inside of that function so it gets the channel as a closure. Then send all of the sizes out the channel and sum them as you receive them.
func DirSizeMB(path string) float64 {
sizes := make(chan int64)
readSize := func(path string, file os.FileInfo, err error) error {
if err != nil || file == nil {
return nil // Ignore errors
}
if !file.IsDir() {
sizes <- file.Size()
}
return nil
}
go func() {
filepath.Walk(path, readSize)
close(sizes)
}()
size := int64(0)
for s := range sizes {
size += s
}
sizeMB := float64(size) / 1024.0 / 1024.0
sizeMB = Round(sizeMB, 0.5, 2)
return sizeMB
}
http://play.golang.org/p/zzKZu0cm9n
Unless you've read the underlying code, you don't actually know how filepath.Walk
invokes your readSize function. While it probably calls it sequentially over all of the files on the given path, the implementation could theoretically invoke several of these calls simultaneously on separate goroutines (the docs would probably mention this if it did). In any case, in a language designed for concurrency, it's good practice to make sure that your code is safe.
The answer that @DaveC gives shows how to do this by using a closure over a local variable solves the problem of having a global variable, so multiple simultaneous calls to DirSize would be safe. The Docs for Walk explicitly state that the walk function runs over files in a deterministic order, so his solution is sufficient for this problem, but I'll leave this as an example of how to make it safe to run the inner function concurrently.
Upvotes: -1
Reputation: 635
If you want to use a variable, you can do this:
func DirSizeMB(path string) float64 {
var dirSize int64 = 0
readSize := func(path string, file os.FileInfo, err error) error {
if !file.IsDir() {
dirSize += file.Size()
}
return nil
}
filepath.Walk(path, readSize)
sizeMB := float64(dirSize) / 1024.0 / 1024.0
return sizeMB
}
Upvotes: 0