Cleaning up unused images in your Markdown content with PowerShell

I was recently tasked with cleaning up some Markdown content with a bunch of screenshots. Sometimes as content was revised, an image would no longer be used, but the image wasn’t deleted. As a result, the images folder would often be packed with files that were no longer used in the final Markdown content.

On a few blocks of content, I would do this manually in VS Code. From the file list (Ctrl+Shift+E), I’d select the file, copy the file name (F2, then Ctrl+C), search all the files for that file name (Ctrl+Shift+F, then Ctrl+V). This was painful to do for more than a few blocks, so I decided to turn to automation, Powershell in this case.

PowerShell is available on Windows and Linux/macOS, so it’s great for wherever I need it. It even seems to properly translate my path separators on different platforms.

Getting the image file names

There are a lot of aliases in PowerShell, they make some verbose commands either shorter and easier to remember, or they duplicate functionality found in the host system. For example, the dir command is available in PowerShell, but it is actually an alias for the Get-ChildItem command. (Please note that PowerShell commands are actually called “cmdlets”, just not here…sorry, not sorry.)

So, to get all the image file names I want to search for, I’ll need a variant of Get-ChildItem. In my case, these images were in a “media” folder.

powershell Get-ChildItem .\media\

This will list out my files, but let’s map the resulting objects into just the filename strings.

powershell Get-ChildItem .\media\ | ForEach { $_.Name }

Now, we “pipe” (|) the resulting Get-ChildItem file objects individually into the ForEach-Object command (or just ForEach as we used), and for each item, noted with $_, we grab the Name property to make sure we have just the filename and not the full file path.

Searching a file for a string

As a quick side-step, let’s figure out how to search a file for a particular string. In PowerShell, there is the Select-String command, which the docs say, “Finds text in strings and files.” It can take a string and return the line containing the given -Pattern parameter. I won’t argue it’s the fastest or most accurate way to do text search in files, as I haven’t done any research on alternatives, but it works well enough for my use case.

powershell Select-String -Pattern "something awesome" -Path .\content\*.md

I’m not stressing this command too much. Select-String can do much more advanced stuff, especially if you need regular expressions.

Combination: Find unused images

With a list of image file names, I can now use another command to filter the list: Where-Object (or just Where). If you are familiar with C#, this is just like the Where function in LINQ. WIth any collection, you pass in a method that returns a boolean object. Anything that would return true on that method will make it into the resulting output.

In this case, for each image file name, I want to see if it doesn’t produce any results when run through Select-String on any of the Markdown files.

powershell Get-ChildItem .\media\ | ForEach { $_.Name } | Where { (Get-ChildItem .\includes\*.md | Select-String $_).Count -eq 0 }

Things start to get a little nested here, but we get all the image file names as before and then filter them. We get all the Markdown files (those with the .md extension), and only leave the ones with no matches (.Count -eq 0) to each one of the file names ($_ of the Where method). Any image file name that makes it through was not mentioned in the Markdown content, which in my case, means it is fair game for deletion.

Deleting a file

With one last side-step, let’s figure out how to delete a file. In this situation, any image I have that isn’t referenced is bound for the trash.

powershell Remove-Item -Path .\media\some-image.png

The Remove-Item command will delete the item at the given path. That path can be a pattern or even a collection, which will be handy when I have a list of images I no longer want.

Final combination: Deleting images unused in Markdown

With all the commands and a little bit of piping them together, I can now assemble a command that will find all unused images and delete them immediately. In this case, we take the output of finding unused images and pipe it directly into the Remove-Item command.

powershell Get-ChildItem .\media\ | ForEach { $_.Name } | Where { (Get-ChildItem .\includes\*.md | Select-String $_).Count -eq 0 } | Remove-Item

And with that, goodbye unused images.

Adding HTTPS to WordPress with Cloudflare

patridgedev.com with a fancy green "Secure" padlock

With some helpful pursuasion from Planet Xamarin, I decided my blog needed to support HTTPS requests. I’ve wanted to find a way to support HTTPS traffic on my blog for a while, but I wanted to find a solution that would Just Work™ without me doing the leg work. I’ll stick to writing the blog posts and let someone else worry about hosting details.

Just like anything security-related, setting up SSL intimidates me a bit (more so because this often involves the dark magic of DNS records). I’ll do security when I have to, but my prerequisite is always that I read so much about the topic I feel comfortable explaining every decision I make to either an expert or someone completely unfamiliar with the topic.

Encrypting patridgedev.com traffic

This blog is currently a WordPress site on a host using AdminPanel. I have spent hours in IIS for ASP.NET sites, but almost nothing on Linux hosting. It’s all just me poking around hoping to find things. Fortunately, when it comes to SSL support, there are simple options that can bolt on to other hosting: specifically Let’s Encrypt certificates and Cloudflare’s secure proxy system.

Everything I looked into for Let’s Encrypt said to use the settings in AdminPanel, but that required some administrator management to enable. These are not my servers, so that was a no-go.

Cloudflare is a proxy between the world and my site that just happens to offer some mostly-automated, and free, SSL support. You give it a domain, it scans the DNS for that domain and creates a new DNS layer, with some requests going through their system before your servers see them. With those DNS records in Cloudflare’s system, you simply update your nameservers at your domain registrar, Namecheap for most of my domains. (Disclaimer: if you buy Namecheap services through that link, they’ll give me little bit as an affiliate.)

In doing this proxy dance through Cloudflare’s DNS, they can encrypt the connection between any web visitor and the Cloudflare servers. These servers then make requests to my existing servers and return the result through that encrypted connection, making patridgedev.com “support” HTTPS requests. You could argue that the connection isn’t yet encrypted end-to-end, which is true, but it’s way better than before where the connection could be compromised at any point from your computer to my servers.

There was zero downtime with this transition. My old DNS entries went away, and the new Cloudflare ones took over. Unfortunately, it also didn’t magically make my site SSL-awesome.

patridgedev.com with several mixed-security broken things making it render horribly in most browsers

Making WordPress SSL-awesome

You probably don’t want to stop here, as most browsers will block mixed content like this. If a malicious attacker can intercept your HTTP requests for the JavaScript on an HTTPS web site, they can do all sorts of nasty things (stealing authentication cookies, making requests on your [authenticated] behalf, changing content in subtle ways). You really do want the browser to prevent that for important sites, even if it means things break. For patridgedev.com, I just don’t want anyone to have such a bad experience on HTTPS, so I had to fix it.

Most of these issues were because links were still being fully qualified to http:// inside WordPress. The simple solution to this was supposed to be setting the WordPress Address and Site Address to https://www.patridgedev.com under Settings>General, but that didn’t seem to do anything for the site. In fact, it made it so my admin WordPress pages would get stuck in a redirect loop.

Setting the WordPress and Site Address fields to HTTPS

Investigating with Chrome Developer Tools showed a request being made over HTTPS and an immediate redirect to the exact same HTTPS URL, meaning it wasn’t redirecting back and forth between two places as I first suspected. I didn’t have anything custom in an .htaccess file, so it wasn’t causing the redirects. The DNS settings didn’t have anything funky that would cause this, just normal A records for www and non-www. I even searched my site’s code for instances of http:, but that was a really noisy set of search results.

My searches for “WordPress too many redirects” were finding lots of potential suspects, but none seemed to apply to my site. Once I added “Cloudflare” into the search term, I stumbled on this Stack Overflow question where the original source appeared to have been having an .htaccess issue. The question had one sad no-votes (at the time) answer that turned out to be my magic bullet.

Inside WordPress is an is_ssl method that checks a couple $_SERVER variables to decide if the current request is using HTTPS and/or port 443, neither of which is technically true for HTTPS requests at that point. WordPress was not seeing HTTPS requests, because they really aren’t. The connection to Cloudflare is secure, but my site is seeing Cloudflare’s proxied requests via old-school port-80 HTTP.

This is_ssl method gets called all over the place to decide how to prefix fully-qualified URLs with a protocol. In order to get through that logic, we have to force the $_SERVER['HTTPS'] variable to be true at the right time before that method is ever called. That’s exactly what the Stack Overflow answer establishes. We set up a little extra check in the site’s root wp-config.php file for a particular header seen on Cloudfare’s proxied requests hitting my site.

php if (isset($_SERVER['HTTP_X_FORWARDED_PROTO']) && $_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https') { $_SERVER['HTTPS'] = 'on'; }

With that in place, we force the HTTPS variable to be 'on' for these requests whenever is_ssl is later called for URL generation, resulting in much happiness. There were still a few non-SSL requests on my blog pages, most of which was my own doing.

Cleaning up the last errors

I had a few widget HTML blocks that were referring to HTTP content: an ancient Amazon affiliate script include, an old Stack Overflow profile badge image, an Apple App Store widget for Smudges, and a lot of embedded post images (and post cross-links). All said, it was just a visual find-and-replace job in WordPress. Only the Apple widget wasn’t as simple as adding an “s” to a URL.

Apple add widget snapshot after HTTPS efforts

The App Store widget definitely doesn’t cooperate with a HTTPS page. Presumably, it is mishandling things on a secure connection. It’s especially evil since it seems to render fine, so the non-SSL request is totally unnecessary. Unfortunately, even the latest iframe HTML from Apple hasn’t changed, so the error won’t go away.

It turns out that I had a tendency to be inconsistent when embedding images in my posts. While there are probably ways to do mass find-and-replace efforts in WordPress, I just opened all my posts individually and did the work there. It wasn’t as bad as it sounds; only a handful of posts had the domain hard-coded, and I haven’t written enough posts to feel intimidated by the count.

I probably still need to set up a forced redirect for non-SSL requests to be secure by default. This could also make sure search engines don’t see HTTP and HTTPS pages as different content, but I’ll work toward that another time, maybe when I finally get to rewriting this blog as a static-content site (e.g., Jekyll and GitHub Pages).

I’m hoping none of this will affect anyone reading blog posts here, but if you see anything looking unusual, drop me a line on Twitter.

Return of Netduino, .NET on small hardware

A little piece of magic wandered into my Twitter feed recently. Between the Windows IoT stuff I messed with this summer, controlling LEDs via Xamarin.Forms, and this awesome news, I’m bound to be learning the hardware side of things a little better.

That’s right, Netduino has returned; this time Wilderness Labs has taken the reins (with Bryan Costanich heading things up). Netduino is a hardware platform, similar to Arduino with lots of input and output options through a bunch of pins on the board, that allows interacting with it using .NET. Not only has Wilderness Labs resurrected Netduino, they have released three new boards: Netduino 3, Netduino 3 Ethernet, and Netduino 3 WiFi.

According to the Wilderness Labs Netduino docs, these new boards are as fast as ever, all running Cortex-M4 processors at 168MHz, and with more flash storage and a bump in RAM: the base Netduino 3 has 384KB of flash (matching the old Netduino 2 Plus), and the Netduino 3 Ethernet and Netduino 3 WiFi both have a bump to 1,408KB. If it wasn’t obvious, the Netduino 3 Ethernet board also offers a 10/100 Ethernet port, and the Netduino 3 WiFi has an 802.11b/g/n wi-fi adapter. So, they are as fast as ever and with more resources to do what you tell them.

I have been interested in Netduino since I first heard about it, though I didn’t buy one to play with until the release of the prior Netduino 2 Plus. All I had ever done with that system was blink the on-board LED. A few hours after I read that tweet, I was digging through my box-o’-electronics until I found it. This time, I decided I would do something more useful, or at least more fun.

Getting hardware rolling [again]

Since I wasn’t entirely sure what firmware was installed on my Netduino, I decided to install the latest from Wilderness Labs. I figured there was no sense in finding a bug that was fixed in the later firmware versions.

Unfortunately, every time I tried to flash the firmware on my MacBook, it would get stuck. After several unsuccessful attempts to debug the issue, I randomly ended up using a different USB cable. Everything flashed perfectly after that, and I tossed the defective USB cable. While it may not help with my faulty cable issue, Alex Corrado has since updated the error handling on the macOS firmware update tool, so you may have not have as much trouble debugging issues by the time you read this.

Getting an IDE rolling

What little code I wrote the last time I did Netduino development was probably on Visual Studio 2013. While the Netduino Discord channel looks like Netduino is quickly moving to Visual Studio 2017 and Visual Studio for Mac, it’s not quite 100 percent there yet. So, I busted out my older MacBook with Xamarin Studio. (I could have also installed Visual Studio 2015, but having spent so much time with 2017’s more self-contained installs makes me not want to return to the previous era.)

Since I was going the Xamarin Studio route, I added the Micro Framework add-in/extension as instructed. A quick restart of the IDE, and I was up and rolling, blinking the same LED I’d blinked a few years back.

What’s next?

I have played with a few sensors and LEDs already, and some more code-happy posts will be coming soon.

So far, I have a few random toys to play with: a water-resistant temperature sensor, a BME280 temperature/pressure/altitude sensor, and an ultrasonic range sensor

Even if the Netduino is overkill, I might start working on a small temperature-monitoring system feeding to Azure IoT Hub with an app front-end for our chicken coop. Or maybe a silly well-sensored aquaponics system. As much fun as blinking LEDs can be, what other semi-useful things should I be trying to make? Drop me a line on Twitter if you have any suggestions.