The GPS coordinate privacy issue

9 minute read

How to remove some EXIF information, namely the GPS coordinates, from pictures with Go programming?

The problem

I’ve been working on a blog, half because I need it to give news to a specific group of people, half because I needed an excuse to write some Go and I though it would be fun.

Said blog has an image upload feature, and I chose to not resize/recompress the images.

Most of the picture I’ll post are taken with a mobile phone (usually an iPhone), and while I do trust all my readers, I don’t like the idea that my precise location will be shared.

Also, to be honest, where’s the fun in using a pre-made tool for blogging, and/or for stripping GPS tags, right?

Where is the GPS info? What is a good Golang Exif library?

Well, based on my previous knowledge, I have a hunch in the EXIF header. So off I go, looking into how to read EXIF from Go. There are only a few library, and the one the furthest along seems to be https://pkg.go.dev/github.com/dsoprea/go-exif. Off we go then.

To start with, I stuck with jpeg and it was pretty straightforward. Here’s a working function, if you pass a ReadSeeker, that willl return you a byte slice containing your image, without the GPS info.

The important line is: rootIb.DeleteAll(exifcommon.IfdGpsInfoStandardIfdIdentity.TagId())

import (
	exifcommon "github.com/dsoprea/go-exif/v3/common"
	jpegstructure "github.com/dsoprea/go-jpeg-image-structure/v2"
)

func removeGPSFromImage(data io.ReadSeeker, size int) ([]byte, error) {
	mp := jpegstructure.NewJpegMediaParser()
	intfc, err := mp.Parse(data, size)
	if err != nil {
		return nil, fmt.Errorf("parsing image: %v", err)
	}

	sl, ok := intfc.(*jpegstructure.SegmentList)
	if !ok {
		return nil, fmt.Errorf("casting segment list: %v", err)
	}
	rootIb, err := sl.ConstructExifBuilder()
	if err != nil {
		return nil, fmt.Errorf("building the builder: %v", err)
	}

	n, err := rootIb.DeleteAll(exifcommon.IfdGpsInfoStandardIfdIdentity.TagId())
	if err != nil {
		return nil, fmt.Errorf("deleting GPS nodes: %v", err)
	}
	fmt.Printf("Removed %d GPS info\n", n)

	// Update the exif segment.
	err = sl.SetExif(rootIb)
	if err != nil {
		return nil, fmt.Errorf("setting new exif header: %v", err)
	}

	b := new(bytes.Buffer)
	err = sl.Write(b)
	if err != nil {
		return nil, fmt.Errorf("writing output buffer: %v", err)
	}
	return b.Bytes(), nil
}

Supporting PNG

This same great person on Github also has a package for PNG: github.com/dsoprea/go-png-image-structure/v2. It’s not a drop-in replacement because one has a Write() method of the slices, while the other has a WriteTo(), but you can pretty easily build around it.

Define an interface

type builder interface {
	ConstructExifBuilder() (*exif.IfdBuilder, error)
	SetExif(*exif.IfdBuilder) error
	Write(io.Writer) error
}

And all you need is a type that satisfies the interface.
There is probably a way to do this with interface embedding, but it seems annyoing and this works.

type pngWriter struct {
	c *pngstructure.ChunkSlice
}

func (pw pngWriter) Write(w io.Writer) error {
	return pw.c.WriteTo(w)
}
func (pw *pngWriter) ConstructExifBuilder() (*exif.IfdBuilder, error) {
	return pw.c.ConstructExifBuilder()
}
func (pw *pngWriter) SetExif(b *exif.IfdBuilder) error {
	return pw.c.SetExif(b)
}

And in order to use it, you do need to know what type the image is.
Here’s the finished function.

func removeGPSFromImage(data io.ReadSeeker, format supportedFormat, size int) ([]byte, error) {
	var mp riimage.MediaParser
	// Parse the image.
	switch format {
	case formatJPEG:
		mp = jpegstructure.NewJpegMediaParser()
	case formatPNG:
		mp = pngstructure.NewPngMediaParser()
	default:
		return nil, fmt.Errorf("unknown format: %q", format)
	}
	intfc, err := mp.Parse(data, size)
	if err != nil {
		return nil, fmt.Errorf("parsing image: %v", err)
	}

	var rootIb *exif.IfdBuilder
	var exifParser builder

    // Get an exifBuilder
	if format == formatJPEG {
		sl, ok := intfc.(*jpegstructure.SegmentList)
		if !ok {
			return nil, fmt.Errorf("casting segment list: %v", err)
		}
		exifParser = sl
	} else if format == formatPNG {
		sl, ok := intfc.(*pngstructure.ChunkSlice)
		if !ok {
			return nil, fmt.Errorf("casting segment list: %v", err)
		}
		if err := removeGPSFromPNG(sl); err != nil {
			return nil, err
		}
		exifParser = &pngWriter{sl}
	}
	rootIb, err = exifParser.ConstructExifBuilder()

	if err != nil {
		return nil, fmt.Errorf("building the builder: %v", err)
	}

	n, err := rootIb.DeleteAll(exifcommon.IfdGpsInfoStandardIfdIdentity.TagId())
	if err != nil {
		return nil, fmt.Errorf("deleting GPS nodes: %v", err)
	}
	glog.Infof("Removed %d GPS info", n)

	// Update the exif segment.
	err = exifParser.SetExif(rootIb)
	if err != nil {
		return nil, fmt.Errorf("setting new exif header: %v", err)
	}

	b := new(bytes.Buffer)
	err = exifParser.Write(b)
	if err != nil {
		return nil, fmt.Errorf("writing output buffer: %v", err)
	}
	return b.Bytes(), nil
}

Surprise! You GPS info is still there!

Things were going well, until I opened a “cleaned” PNG, and saw that the GPS info was still very much in there. So what’s going on?

It turns out that some PNG files have a iTXt EXIF fragment (see some details), containing key/value pairs, one of the values potentially being Adobe XMP xml, which can store the GPS location as well.
We could remove the xml piece completely but while I’m at it, let’s parse the XML, remove the GPS nodes and keep the rest.

Extracting iTXt content

First, you need to extract the keyword and text. For now I won’t be supporting any compression (like zlib) that these images may be using. Here’s the format

  • Keyword 1-79 bytes (character string)
  • Null separator 1 byte (null character)
  • Compression flag 1 byte
  • Compression method 1 byte
  • Language tag 0 or more bytes (character string)
  • Null separator 1 byte (null character)
  • Translated keyword 0 or more bytes
  • Null separator 1 byte (null character)
  • Text 0 or more bytes
// extractITXtXML returns the keyword, the XML, and optionally an error
func extractITXtXML(input []byte) (string, []byte, error) {
	buf := bytes.NewBuffer(input)

	keyword, err := readString(buf)
	if err != nil {
		return "", nil, fmt.Errorf("reading keyword: %w", err)
	}
	glog.Infof("found keyword: %s", keyword)

	compFlag, err := buf.ReadByte()
	if err != nil {
		return keyword, nil, fmt.Errorf("reading compression flag: %w", err)
	}
	if compFlag != 0 {
		return keyword, nil, errors.New("compression not supported")
	}

	_, err = buf.ReadByte()
	if err != nil {
		return keyword, nil, fmt.Errorf("reading compression method: %w", err)
	}

	language, err := readString(buf)
	if err != nil {
		return keyword, nil, fmt.Errorf("reading language: %w", err)
	}
	glog.Infof("found language: %s", language)

	translatedKeyword, err := readString(buf)
	if err != nil {
		return keyword, nil, fmt.Errorf("reading translated keyword: %w", err)
	}
	glog.Infof("found translatedKeyword: %s", translatedKeyword)

	return keyword, buf.Bytes(), nil
}

And this requires a little utily to read a null terminated string from a byte slice

func readString(r io.ByteReader) (string, error) {
	var bytes []byte
	for {
		b, err := r.ReadByte()
		if err != nil {
			return "", fmt.Errorf("reading keyword: %w", err)
		}
		if b == 0 {
			break
		}
		bytes = append(bytes, b)
	}
	return string(bytes), nil
}

Parsing the AdobeXMP XML

This one is fairly straighforward, though the default golang XML parsing capabilities are pretty limited. I found that github.com/beevik/etree was really easy to use for this purpose.

Given an element, this resursive function looks for all the “GPS” elements in the exif namespace.

const exifNamespace   = "http://ns.adobe.com/exif/1.0/"

func removeGPS(e *etree.Element) {
	if e.NamespaceURI() == exifNamespace && strings.HasPrefix(e.Tag, "GPS") {
		glog.Infof("Removing %s\n", e.FullTag())
		e.Parent().RemoveChild(e)
	}
	for _, child := range e.ChildElements() {
		removeGPS(child)
	}
}

Putting it together.

There may be more than one iTXt chunk in the header, so all we have to do is iterate, and see if one or more are of the type Adobe XMP one.

func removeGPSFromPNG(cs *pngstructure.ChunkSlice) error {
	chunks := cs.Index()

	itxt, ok := chunks["iTXt"]
	if !ok {
		glog.Warning("no iTXt data in PNG file")
		// This however isn't an error, we have nothing to remove
		return nil
	}

	for _, chunk := range itxt {

		keyword, xml, err := extractITXtXML(chunk.Data)
		if err != nil {
			return err
		}

		if keyword != adobeXMPKeyword {
			glog.Infof("ignoring iTXt chunk with keyword %q", keyword)
			continue
		}

		doc := etree.NewDocument()
		if err := doc.ReadFromBytes(xml); err != nil {
			return err
		}

		removeGPS(doc.Root())
		doc.Indent(2)
		newXML, err := doc.WriteToBytes()
		if err != nil {
			return err
		}

		var newData []byte
		newData = append(newData,
			"XML:com.adobe.xmp"...) // Keyword
		newData = append(newData,
			0, // Terminator
			0, // Compression flag
			0, // Compression method
			0, // end of language tag
			0) // end of translated keyword,
		newData = append(newData, newXML...)

		// Replace
		chunk.Data = newData
		chunk.Length = uint32(len(newData))
		chunk.UpdateCrc32()

	}
	return nil
}

adobeXMPKeyword = "XML:com.adobe.xmp"

One time? Two times!

At this point I was pretty happy with everything. And a tiny mistake ruined it all: as I output some images to check the GPS info was removed, I mistakenly used one of these output files as an input. Next thing I know, there’s a panic() in the exif library.

This seems to be due to a specific EXIF tag which is written back properly.

I have reported this and found an ugly fix here.

What about HEIC?

Some browsers still don’t support HEIC (actually, most of them don’t), so when uploading an HEIC image, I want to convert it to jpg, or png. I was surprised to find no native golang library handling HEIC. I would normally not mind cgo, but here I’m cross-compiling: building a go binary on Mac OS with Apple Silicon, to be run on Linux x64. I know it’s doable, it just looked really annoying.

So instead, I ended up executing a configurable command, actually calling ImageMagik’s convert utility.

Don’t you love friends with a new camera?

And just when I thought I was all done, a good friend of mine sent me some of the pictures he took of us, with a Panasonic/Lumix DMC-G80, and here comes a new, unsupported EXIF tag called PrintIM. Essentially, this now throws an error when trying to call ConstructExifBuilder().

There’s not a ton of documentation on this tag.

This changelog on the libexif project, as well as this perl source code helped me implement a coder/decoder for the ‘dsoprea/go-exif’ package.

[EDIT May 13th 2023] Here’s the pull request I sent, and here’s the interesting part.

It’s not all there, but it’s good enough

After all this, I analyzed an image out of my pipeline with an online EXIF analyzer, and compared it with the original. I do find a few sections missing, as well as altered data. For example

  • AccessorySerialNumber is now missing
  • AFPointPosition was “0.38 0.5” it’s now “0 0”
  • “BabyAge” was reading “Not Set”, not it’s reading “0 2 4 6 3” separated by non-printable characters
  • The “LensSerialNumber” is gone

All in all, I don’t really mind, I honestly could have removed all EXIF info and be happy with it, but I do find it surprising, and I should potentially get in touch with the maintainer/author to see if I can help.

A few extra things I learnt

  • There is a ton of information in EXIF, things like the battery level when the picture was taken!
  • How to modify go.mod in order to divert a dependency to another local directory you can play with: use the replace directive in your go.mod file
  • [EDIT May 13th 2023] I have learnt how to create a pull request on Github, and sent my first one!