State of (my) Network: 2023

Welcome to AASullivan.com! I’ve been meaning to do this for a while, so here it is: The State of My Network. I’ve been planning to do an overview of my entire set up for some time now and I’ve finally kept everything in a consistent state now for a few months after many years of tinkering. A little background…

My home network (also known as my “home lab”) has been a long term project of mine to both try out new technologies and keep my skills sharp in a real server environment. It started in 2014 with a single server in my apartment running ESXI 3.5 on a system with a 6 core AMD CPU and 16GB of RAM. It was a blast to try new things (DNS server, file server, Plex, etc) and I’ve learned a lot since those times. My current Home Lab is considerably bigger now and is larger now than many small company systems; My poor power bill will never be the same. Let’s take a look at some of the set up!

To start with the basics, I’m running a pfSense firewall connecting out to a Verizon Fios gigabit connection. Going from this is a 48 port gigabit switch with 10GB SFP connections to a backbone 10GB SFP Microtek switch. I have 10GB between all servers and my desktop allowing regular speeds of around 550MB/s when going SSD to SSD. This is all the core backbone of my network. Time for the fun stuff.

I run a large number of self hosted services including, but not limited to:

  • PiHole DNS blocking
  • Plex
  • Home Assistant
  • Killing Floor 2 gameserver
  • Tiny Tiny RSS
  • NextCloud
  • Zabbix
  • Uptime Kuma
  • Bookstack
  • Graylog
  • …and many more

Most things are running on an ESXI 6.5 server. This is a 1U with 192GB RAM, dual 6 core Xeons and (6) 1TB SSDs in RAID 10. Along side this is two Unraid servers (Dual Dell R510) mirrored and both running multiple services. RIP power bill.

The original goal of this network and home lab was to learn more about some enterprise solutions and keep my skills sharp. Instead, I’ve built a network more robust than many small business networks with a lot more control and functionality.

Things I’ve learned while doing all of this:

  • Your power bill is going to suffer (worth it!)
  • Servers put out a LOT of heat. I often notice this even upstairs with the floor being hot over the servers in the basement
  • Server hardware gives you a ton of ways to troubleshoot things, often with LEDs and beep codes to help narrow down issues
  • Low power hardware options are out there but are often much more expensive up front
  • Knowing what’s going on in your network is awesome. Knowing everything that’s going on can also drive one nuts when you see how often items are connecting to the internet (Looking at you Windows 10 and IoT devices)
  • If you want to build something, searching the internet can give you a lot of ideas and input. Most of my projects were done in an afternoon after finding a new product and reviewing issues/installation

All in all, I’ve taken control of my network and as much as it can drives me nuts doing all the maintenance and updates, it has been very stable for years and it’s a very good feeling to know what’s going on with everything.

Keep in mind: this can all start with an old desktop computer running Docker or a few services to tinker with. I started this all with an Ubuntu 6.06 LAMP server in 2008 and has grown into the passion it is today.

Thanks for sticking through my brain dump, I hope you enjoyed the read and will stop by again. Cheers!

VMWare Kubernetes Cluster Build Notes

I built a Kubernetes cluster tonight to try and learn a bit about this technology, here are my notes about building it!

Built on a new ESXI server with the following specs (you’d be surprised how well this rig holds up):

  • Dell Optiplex 7010
  • Core i7-3770
  • 32GB RAM
  • 1TB SSD

Created virtual network only accessible to VMs:

Used pfSense as firewall to keep independent from network and to provide DHCP to internal virtual network. Used a single Windows Server 2019 VM to access WebGUI to control pfSense. Needlessly complicated to do all of this but I wanted to get a virtual network going to test all of this out on.

Built the following VMs on Ubuntu 22.04 LTS:

  • Controller: 2 vCPUs, 4GB RAM, 50GB HDD
  • Worker X2: 2 vCPUs, 4GB RAM, 25GB HDD

I then set all their IPs in pfSense to keep them from changing. I then used the following guide, specifically the scripts from github mentioned in the article:

https://devopscube.com/setup-kubernetes-cluster-kubeadm/

Command to download scripts:

git clone https://github.com/techiescamp/kubeadm-scripts

Chmod+x both scripts to set executable. Run the “common.sh” script on all systems, both controller and nodes/workers. Edit the “master.sh” script. Change the “MASTER_IP=” portion to your controller’s IP address. Change the network range as needed for the nodes under “POD_CIDR” (I left these as default). Run the script on the controller.

Run this command to get the join command:

kubeadm token create --print-join-command

Run the FULL output on any workers/nodes you want to connect to it. Check connection on master/controller:

kubectl get nodes

Example output:

Note that their roles are set to “none”. Fix by running this command, substituting out the names of the nodes/workers:

kubectl label node mini-worker-1 node-role.kubernetes.io/worker=worker

Then check again to confirm they are now workers:

Congrats! You now have a (slightly dirty) Kubernetes cluster running and ready to assign deployments to! This is the bare basics but should get you rolling, good luck and happy tinkering!

Building a non-logging, encrypted DNS server

Welcome back! Today I’m working on a project to secure my web surfing to be an anonymous as possible using a combination of a software package called “Pi-Hole” and a VPN provider.

So, let’s start at the basics: VPN and DNS

DNS, or Domain Name System, is how we access data on the web. Think of it like a pointer: When you go to facebook.com, your request goes to a DNS server which takes the website name (facebook.com) and converts it to an IP address of a server to access the website. Example:

euthonis@DESK:~$ nslookup facebook.com 172.31.1.26
Server: 172.31.1.26
Address: 172.31.1.26#53

Non-authoritative answer:
Name: facebook.com
Address: 31.13.67.35

Notice  the 31.13.67.35. This is the IP address of Facebook’s server to connect to their website. Neat, eh? This is how most web access occurs except for rare circumstances where you would need an IP directly.

Now, VPNs.

VPNs are marketed as a way to hide your browsing and activity online, and this is true in most cases. VPN stands for “Virtual Private Network”. In a nutshell, they create a “tunnel” through which all of your web browsing goes through this encrypted tunnel so your ISP cannot see what you’re doing. This offers a great level of privacy but doesn’t prevent website tracking cookies, so there are limits to it. Most VPN services (Nord, Mullvad, TorGuard) all claim to use 0 logging on their systems; Even if ordered by a court, they do not have logs on your browsing history. Yes, this does sound a bit sketchy but even a normal user/person can benefit by not being tracked by your ISP, having your data sold to advertising companies.

So what happens if you want the ad-blocking that Pi-Hole offers along with the privacy of a VPN? You build your own DNS (Pi-Hole) server and set it up to be as anonymous as possible.

I followed the below guide for my build using Ubuntu 22.04LTS and ignoring the NetData portions (not needed for my use cases):

Create DNS-over-TLS bridge with Pi-hole, unbound and stubby on Ubuntu Server

There’s a couple configuration changes needing to be made to help us keep the Pi-Hole from logging any requests:

  1. In the GUI/Admin interface, go under Settings > Privacy (tab), select from the options “Anonymous Mode”. If an error occurs, go into Settings and click “Disable Query Logging”, then “Flush Logs (last 24 hours)”. This will disable all Pi-Hole logging
  2. Modify the file:
    sudo nano /etc/unbound/unbound.conf.d/logs.conf

    Edit it to look like this:

# enable query logging
server:
    # If no logfile is specified, syslog is used
    # logfile: "/var/log/unbound/unbound.log"
    log-time-ascii: no
    log-queries: no
    log-replies: no
    verbosity: 0

Restart the services:

sudo systemctl restart unbound stubby ; systemctl status unbound stubby -l

With these options set, there is no longer any logging on the server.

For the final part of all of this, a lot of VPN providers allow custom DNS servers to be used. Take the IP address of your DNS server and enter it into the custom DNS server of the VPN and connect. You should be able to use the internet over the VPN as before, but now you have your own controlled adblocking via Pi-Hole and the security of knowing your server does not keep logs or any history. You should, assuming your VPN is trustworthy, be essentially invisible on the internet now.

I hope this write up was helpful! I’ve been tinkering with these projects for some time off and on.

One last tip: If you find a website is blocked improperly from Pi-Hole, you may need to enable logging again (reversing the items from Step 1, above) to whitelist the problem domain. Don’t forget to turn logging back off after!

Notes from installing Nextcloud on Ubuntu 22.04LTS

Today I set up another Nextcloud server after taking the former one offline due to the size of backups for it getting a little out of hand. This also allows me to run the latest current version of Ubuntu Server (22.04LTS).

Here’s the guide I followed with a fresh install of Ubuntu:

How to Install Nextcloud on Ubuntu 22.04

Following this, I ran into a number of issues here:

Downgrading to PHP 7.4:

apt install php7.4-mysql php7.4-mbstring php7.4-xml php7.4-curl php7.4-gd

Select 7.4 from list:

update-alternatives –config php

Set version:

sudo a2dismod php8.1
sudo a2enmod php7.4
sudo systemctl restart apache2

Install missing php packages:

apt install -y apache2 mariadb-server libapache2-mod-php7.4
php7.4-gd php7.4-json php7.4-mysql php7.4-curl
php7.4-intl php7.4-mcrypt php-imagick
php7.4-zip php7.4-xml php7.4-mbstring

sudo systemctl restart apache2

 

Tekkit2 Minecraft Build notes on Ubuntu 22.04LTS

Assuming a fresh install of Ubuntu 22.04LTS, accepting default options and no extra packages except OpenSSH server to login.

Install zip/unzip:

apt-get update; apt-get install zip unzip

Download archive for Tekkit2 (as of 20221202):

wget https://servers.technicpack.net/Technic/servers/tekkit-2/Tekkit-2_Server_v1.1.3.zip

Unzip:

unzip Tekkit-2_Server_v1.1.3.zip

Install Java and add Repos:

add-apt-repository ppa:webupd8team/java
apt install openjdk-8-jdk

Attempt to launch to confirm server is working:

chmod +x LaunchServer.sh
./LaunchServer.sh

Should take several minutes to load, depending on hardware. If this works, set up cron to similar to below to launch server upon reboot:

root@syr-tekkit:# cat /etc/cron.d/root_gameserver
@reboot root sleep 5 && cd /home/asullivan && /usr/bin/screen -dmS gameserver-screen /home/asullivan/LaunchServer.sh

Open up port 25565 to internet while server is running to connect. Use the Technic Launcher.

Battery backups: maintaining access when there’s no power

A little background: Some years back I learned a very hard lesson about losing power on a RAID array that didn’t have an onboard battery backup. The result was ~7TB of data gone, about 1.5TB completely irreplaceable including old school work and photos. This was a hard pill to swallow and helped me get better about redundant backups and another thing that was especially important: UPS backups, or Uninterruptible Power Supplies.

A UPS is a device which provides power for a short time during a home or business power failure by providing an AC output to whatever is plugged into it. I have several of these scattered throughout my home, including for my desktop and a couple lights around the house which act as emergency lighting using older, smaller UPS devices. My servers are always running UPS backups, but on a big larger scale.

My company was kind enough in the fall of last year to be giving away a large amount of hardware. In this was a 2U style UPS system which can run all of my (4) 2U servers, firewall and backbone switches in my basement for about 30-40 minutes. I also, however, had my old UPS backups: a 1350VA and 1500VA systems. Now, I wanted to use these to their maximum potential.

I had to move around the server rack (store bought metal shelving FTW) and it gave me a great opportunity to plan ahead for what I wanted to do. Most servers come with redundant power supplies; That is, they allow multiple inputs, so if one power supply fails or is no longer providing/receiving power, the server will switch to the other without interruption or power loss. Neat, eh?

I ran all the primary power supplies into the new UPS backup and then routed all my secondary power supplies and networking hardware into the older UPS units. This allows a total runtime somewhere in the 40+ minute range during a power outage. Not bad, but there’s a catch: How do I shut down (2) ESXI hosts, (2) unRAID hosts and multiple other smaller systems when the power is out and I need remote access? Simple: Battery backup on my desktop in my office.

I used yet another UPS unit in my office and ran a dedicated 10GB SFP line to my core switch in the basement which is also on a UPS. This allows my desktop to run somewhere between 25-30 minutes off power along with some LEDs in my office acting as emergency lights essentially, and one large monitor. I’ve done testing but I finally had a real situation pop up.

Last night we had quite a wind storm and the power was out for about 30 minutes or so. Sometimes my power flashes for 1-2 minutes but this was a long outage it seemed. After ten minutes, I logged in and began gracefully shutting down the VMs on the ESXI hosts and shut down my two unRAID hosts as well. In just a few minutes, my entire network was gracefully shutdown, without data loss or interruption. Adding to this, I was actually in the middle of a hard drive swap on one of my unRAID hosts which was done without issue due to the UPS backups. Another win for preparedness.

I hope this gives you, the reader, some ideas for a home network or a small business and shows why this is so important; Had I had a complete power failure without the battery backups, I could risk losing multiple 10’s of terabytes of data and could have corrupted the disk rebuild on my unRAID system. A little planning goes a long way in this case. Plus, it’s just cool; Why else would we homelab? Cheers!

Raspberry Pi offline Wikipedia

Wikipedia is a vast archive of knowledge and information we tend to forget is there. An encyclopedia of knowledge brought by users and edited by a community, it has a high accuracy rate and information on just about any subject you could want. You can also download an entire archive of it at around 90GB at the time of this writing!

I’ve had the idea for a while now about making an offline version to run locally for myself or friends, maybe something just to browse during a flight or roadtrip. Or, as my prepping thoughts say, maybe something for when the power’s out and easy to access! Enter the Raspberry Pi, a low cost and low power computer to run this using a suite of tools you can run off a battery pack and access from one’s phone/tablet/computer. Well, this is easier than you might think! I’ll be going over the ideas and thought processes of this at a high level as the project took some time. I can provide more details if you’d like by reaching out to me at if you’d like.

The goals of this project were as follows:

  • Use a Raspberry Pi to run this off a battery pack for several hours at minimum
  • Must be 100% self contained; This should be able to boot, run and provide access without user input
  • Access must be simple; In this case, a self created Wifi network hosted from the Raspberry Pi
  • Small, easy to travel with. For this reason I went with a Raspberry Pi Zero W. One of the smallest Raspberry Pi SOC computers that exists (about the size of a large flash drive).

Starting with the basics: Raspberry Pi Zero W. This is a single board computer a little bigger than a flash drive which can be powered by a small USB batter pack and a micro-USB adapter. I installed a 128GB micro-SD card and flashed an image of Raspberry Pi OS onto it (an ARM branch of Ubuntu server).

The next steps were to download a suite of tools called “Kiwix Tools”. This neat set of applications allow you to host a downloaded archive of Wikipedia, providing users a simple web interface just like Wikipedia. Once done, I could then access this over my local network.

Next up was the user hostapd/dnsmasq to build a Wifi network using the onboard wifi chipset to connect to the raspberry pi, provide DHCP and resolving locally which allowed me to connect via any device with wifi. I used a tablet to configure and confirm this.

For the last portion, I installed a LAMP stack onto the system (Linux + Apache + Mysql + PHP) and installed a copy of wordpress. I wanted a way to easily write notes into a webpage for anything to see when accessing this. Information, notes, ideas, etc, this was the easiest way to write and view this. It’s surprising how well this runs off the little system too!

In the end, I have a small ultra low power server essentially which allows me access to a vast amount of information on an almost endless supply of topics, along with WordPress for notes and further information I want to add!

In the future, I may build HTTRACK onto the system to rip websites for offline viewing as well to have even more access and information offline as I want. This was more of a proof of concept system but I’m quite proud of how it built out. Anyways, I hope you enjoyed my rambling and thoughts. Cheers!

unRAID: capacity and ease of use over performance

I’ve been looking over various NAS (Network-Attached-Storage) operating systems for some time now. Naturally, there’s two big players in the game that everyone seems to go to: FreeNAS and unRAID. Both boast a considerable user base, community add on support and a ton of customization but one big difference at a quick glance: FreeNAS, as the name implies, is free while unRAID is a pay for licensed OS. But a quick glance only shows so much.

After spending several months going back and forth, I decided to do some testing with unRAID. One of the biggest reasons was my mix of various extra hard drives I have that I wanted to use in the pool for the software RAID configuration. FreeNAS requires matching disks in pairs and I have odd sets of drives ranging from 4, 8 and 12TB capacities. I initially did some testing on an old 2U with 6 X 1TB disks to test and get used to the GUI. I then upgraded one of the disks in the array to a 2TB disk to see the process. Spoiler: stupid easy and straight forward, exactly what I want. It was time to go big on the build.

I purchased a Dell R510XD server for this project: 32GB ECC RAM, twin 6 core Xeons and 12 bay capacity; Perfect amount of drive bays and overkill on CPU and RAM for future proofing. Unfortunately, this was the beginning of a bit of a tough learning process…

Being new to software RAID, I forgot to take into account the hardware RAID card. The H700 card onboard does not support JBOD (Just a bunch of a disks) which allows an operating system to see ALL individual disks and build the software raid from this. I had to bite the bullet and order another RAID card and cables that would support the proper config. 50 bucks later, I was in business.

The initial configuration was this: (2) 12TB disks for parity, (4) 8TB disks and (5) 4TB disks for the storage pool. With the dual parity disks, this allows up to 2 disks to fail without data loss. The initial RAID parity burn in took about 30 hours which isn’t bad over all. Unfortunately, I soon found the write speeds with the software RAID to be less than stellar, something unRAID is known for. I took the next step of adding a 1TB SSD as a cache disk to mitigate this issue and can now sustain gigabit throughput on uploads without issue.

Onto the software side of things, I’ve added a few of the usual plugins (Community Applications, Calibre, Plex and others). The installs take all of 30 seconds and typically run with a dedicated docket instance, something I’ve never tinkered with prior but am quickly falling in love with for its simplicity and ease of maintenance. The software RAID seems robust, the GUI is sleek and modern and everything is snappy and well laid out. I went through and upgraded capacity replacing one 4TB disk with an 8TB (about 20 hours to burn in) and this again was quick and painless.

One quick thing of note: One of the biggest differences besides the disk loadouts between unRAID and FreeNAS is the performance. FreeNAS boasts considerably higher read/write speeds due to the way the parity works (excellent video summarizing this here). The other is that changing the array (modifying disks, adding, removing, etc) takes considerably more work and effort including CLI management of disks. As someone who’s broken a number of *NIX systems on the CLI, this was a bit of a deal breaker for me. Another difference because of the disk management being different: You can add just ONE more disk at a time to unraid, whereas FreeNAS requires matching pairs to work.

All in all, I’m shocked at how well this project has come together. With the current config, I’m at 56TB raw, 51.8TB usable capacity. The system is used both as a file dump for all my stuff and as a redundant backup system from several other systems due to its capacity. I would definitely recommending trying the software out for free and see how you like it and if it’s for you or your business.

Quick take: Slower than FreeNAS, more capacity, make sure you have a JBOD support RAID card or direct pass through on SATA.

Automated Youtube Downloads Into Plex (Windows)

Welcome to another Overly Complicate Project! This time, it started with some advice from our friends at r/DataHoarder and a fun tool called “youtube-dl”. This has taken a bit of tinkering and some custom code, but I now have an all-in-one solution that downloads Youtube videos from a playlist/channel, confirms progress to save bandwidth on future downloads, and stores them into a Plex library for local viewing. Let’s begin.

I first started with a VM running Windows Server 2019. Following the steps below, you can install the WSL (Windows Subsystem for Linux) and have a full Ubuntu/Linux shell to use. I chose Ubuntu 16.04LTS as this is my favorite version of the server software.

https://docs.microsoft.com/en-us/windows/wsl/install-on-server

After installing WSL, I ran the below in the BASH/Ubuntu shell to install Pip, ffmpeg (used for video conversion by youtube-dl) and youtube-dl:

sudo apt update
sudo apt install python-pip ffmpeg
sudo pip install youtube-dl

This will install all necessary packages needed and get you into a running state for youtube-dl. For my server, I have a 200GB boot disk and a 10TB secondary disk. So, opening the bash shell and changing to that disk, I made a folder called “youtube” to store all my videos into. As a test, you can run youtube-dl against a video of your choosing to confirm everything works. This is a basic command I use for everything which sets the filename, retries, progress file, etc:

youtube-dl -o '%(playlist)s/%(title)s.%(ext)s' --format bestvideo+bestaudio/best --continue --sleep-interval 2 --verbose --download-archive PROGRESS.txt --ignore-errors --retries 10 --write-info-json --embed-subs --all-subs YOUTUBE_URL_HERE

To break this down:

  • -o: Output of Playlist (channel as well)/Title.Extension of file
  • –format: Best video and audio on the requested video
  • –continue: continue if this was interrupted at last download progress
  • –sleep-interval: sleep for 2 seconds between steps
  • –verbose: verbose output
  • –download-archive: track progress of downloaded videos to save time and bandwidth in PROGRESS.txt file
  • –ignore-error: ignores errors and keeps processing
  • –retries: retries when error 404 or similar found
  • –write-info-json: output JSON string with information about video
  • –embed-subs/–all-subs: use ffmpeg to burn in subs into video

The above is what I use for everything except 4K channels which are just too much space to hit in bulk. Using this, I tossed several of those commands into a .sh file such as this:

youtube-dl -o '%(playlist)s/%(title)s.%(ext)s' --format bestvideo+bestaudio/best --continue --sleep-interval 2 --verbose --download-archive PROGRESS.txt --ignore-errors --retries 10 --write-info-json --embed-subs --all-subs YOUTUBE_URL_HERE
youtube-dl -o '%(playlist)s/%(title)s.%(ext)s' --format bestvideo+bestaudio/best --continue --sleep-interval 2 --verbose --download-archive PROGRESS.txt --ignore-errors --retries 10 --write-info-json --embed-subs --all-subs YOUTUBE_URL_HERE
youtube-dl -o '%(playlist)s/%(title)s.%(ext)s' --format bestvideo+bestaudio/best --continue --sleep-interval 2 --verbose --download-archive PROGRESS.txt --ignore-errors --retries 10 --write-info-json --embed-subs --all-subs YOUTUBE_URL_HERE
youtube-dl -o '%(playlist)s/%(title)s.%(ext)s' --format bestvideo+bestaudio/best --continue --sleep-interval 2 --verbose --download-archive PROGRESS.txt --ignore-errors --retries 10 --write-info-json --embed-subs --all-subs YOUTUBE_URL_HERE
youtube-dl -o '%(playlist)s/%(title)s.%(ext)s' --format bestvideo+bestaudio/best --continue --sleep-interval 2 --verbose --download-archive PROGRESS.txt --ignore-errors --retries 10 --write-info-json --embed-subs --all-subs YOUTUBE_URL_HERE

Then, simply run in your BASH application: sh YOUR_SCRIPT_NAME.sh

Cool, right? Now you have a script with all your commands to download your favorite channels or unlisted/public playlists. One of the cool things with the integration of BASH/Windows is now you can also make a Windows BATCH (.bat) file to launch this. Making a .bat file called whatever you want (runme.bat is my favorite), put in to call the script you built, first changing to your directory of YOUR script to properly run:

@ECHO off
echo Launching youtube download helper script in BASH...
timeout /t 5
bash -c "cd /mnt/e/youtube/;sh YOUR_SCRIPT_NAME.sh"
echo Completed!
timeout /t 60

Neat, now you can single click the .bat file to launch your downloads! BUT, there’s something else you can do now: Windows Task Scheduler. Go into “Task Scheduler” in Windows, and create a simple task. In this, set it to whatever time you want and have it run daily/weekly/however you’d like. I have an example here of the one I use:

Have this simply run your BATCH (.bat) file and it will now fire off automatically as you requested. This completes your automated downloads portion. I took this a step further, however, because a lot of my favorite music and music videos get taken down frequently and I wanted a nice, simple way to search and watch/listen to them. Enter Plex.

Grab a copy of Plex server from online and install on your Windows system. Having some horsepower here is definitely recommended: minimum quad core and 6GB+ of RAM (I’m running 8 cores and 12GB due to extra processing of 40K+ videos).

Under Plex, configure a Library of “Other Videos” and search for your top directory where your Youtube videos will be downloading from. It will then scan and add any videos it found by name to make searching easier for the future. I also went into my Plex server options and configured it to check the library for changes every 12 hours or so to catch anything downloaded overnight. I wake up in the morning and my newly downloaded videos are processed and ready for viewing!

I hope you find this interesting and informative. This has been a long project and has gotten very complicated as I built a Perl based wrapper to automate more of this. I encourage you to tinker and make this more effective and easier for your specific situations. Maybe a wrapper script to pull URLs from a file? Good luck and happy tinkering!

Archiving youtube and website data

YouTube has become a bit of a dilemma for many people like myself who enjoy music and video edits with said music; We love supporting artists we enjoy along with the video edits. But, with companies locking down on content, these videos and channels are going offline suddenly and often without warning. I’ve taken to downloading backups of these as often as possible. With a little help from r/datahoarding, I now have a great set up that does this with minimal user intervention.

The fine folks over at r/datahoarding swear by a tool called “youtube-dl”. For an example install on an Ubuntu WSL in Windows:

 sudo yum install python-pip ffmpeg
 sudo pip install youtube-dl 

Then it’s just a matter of providing content to download:

 youtube-dl -o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'  --format bestvideo+bestaudio/best --continue --sleep-interval 10  --verbose --download-archive PROGRESS.txt --ignore-errors --retries 10  --add-metadata --write-info-json --embed-subs --all-subs  https://www.youtube.com/channel/UCuCkxoKLYO_EQ2GeFtbM_bw 

This will output everything from the channel its own directory (in this case “Uploads from Half as interesting”), sleep 10 seconds between downloads, store info/subs and store progress to prevent excessive traffic attempting to redownload videos. This is running on a dedicated system now called from Windows Task Manager once a week. The bonus is I have several playlists to download that I simply tag into whatever playlist I choose and the videos are download automatically in the background for future perusal.

Now, what about backing up an entire website/directory/open directory? Well, there’s a handy tool for that too: wget

Over at r/opendirectories (I love Reddit), the lads and lasses there have found some great data/images/videos/music/etc and it’s always a rush to get those downloaded before they’re gone. In some cases, it’s old software and images; Other times it’s old music from another country which is interesting to myself and others. In this case, again using the Windows Subsystem for Linux (WSL), you could do similar to below:

/usr/bin/wget -r -c -nH  --no-parent --reject="index.html*" "http://s2.tinydl.info/Series/6690c28d3495ba77243c42ff5adb964c/"

In this case, I’m skipping downloading the index files (not needed), the “-c” flag continues where it left off, and it downloads everything from that directory. This is handy for cloning a site or backing up a large amount of items at once. This can run for days possibly and can choke on large files (I’ve only seen issues with files over 70GB; Your mileage may vary) but this has worked well so far. I now have a bunch of music from Latin America in a folder for some reason.

What are your thoughts? Do you see a lot of videos missing or being copyrighted? Do you have a better way of doing this? Let me know!