Roll Your Own Tailscale MagicDNS Using pfSense and Unbound

The Why

First question that would come to mind is why bother ? Tailscale implements a perfectly usable MagicDNS feature. Well there are a few reasons I quickly discovered after telling myself “Looks great, let’s enable it”:

  1. Even if the issue is closed I still very often get DNS issues in Android if Tailscale with MagicDNS is active.
  2. When in the “home network” if tailscale is not active you have to remember two URLs: device.something-something.ts.net and device.mydomain.com. Additional troubles for services that only support one trusted URL (Tiny Tiny RSS, joplin-server).
  3. You can’t get SSL certificates with let’s encrypt, you have to get them using Tailscale HTTPs

Basically the one that I care the most about is this: My work laptop is already connected to the work tailnet. But I also want it able to connect to a few of my home services (nextcloud, joplin-server) at least when at home – which face it is 90% of the time – without switching connections all the time.

One sollution would be to only use the hostname without the FQDN as URL. DNS search resolves this to either the tailnet URL or device.mydomain.com but I dislike it and doesn’t work for SSL.

Another would be to share devices between work and home tailnets but I won’t do that due to obvious reasons.

So in the words of one of my favourite youtubers “I Make a New One!”

Setup Tailscale and Unbound on pfsense

Beyond the scope of this post, you should already have unbound and the official tailscale package setup on pfsense. These aditional settings are needed for our setup:

Use DHCP Registration or Host Overrides or both to add machine1.mydomain.com and machine2.mydomain.com with internal IPs to Unbound. Ofc. you can have as many as you want.

Network interfaces need to be set to All. That’s because the tailscale interface doesn’t show up in the Unbound interfaces list. Use firewall rules to shield Unbound on unwanted interfaces e.g. on WAN.

Then add the following in the Custom options box:

server:
  access-control-view: 100.64.0.0/10 ts_view

view:
  name: "ts_view"
  view-first: yes
# machine1
  local-data: "service1.mydomain.com. 90 IN A 100.aa.bb.cc"
  local-data: "service2.mydomain.com. 90 IN A 100.aa.bb.cc"
# machine2
  local-data: "service3.mydomain.com. 90 IN A 100.aa.bb.dd"

What this does: when a request arrives via the 100.64.0.0/10 network (tailscale) it will answer with the IPs defined in the “ts_view” istead of what’s registered in overrides or DHCP. Incidently you can also use this feature to answer with different IPs depending on the Unbound interface the request arrives on. (I use it for storage servers serving multiple vlans)

Go to Access Lists and Add a new one to allow requests from the Tailscale net:

Setup DNS on Tailscale

Go to Tailscale Admin Console => DNS

Add custom Nameservers for your domain mydomain.com in this example:

Replace 100.80.1.2 with the Tailscale IP of your pfsense (or whatever you use for this split DNS setup)

DONE

Once this is setup what happens:

When a machine in your home lan without tailscale tries service1.mydomain.com pfsense will return the internal IP of the machine. Good.

When a machine running tailscale tries service1.mydomain.com tailscale will intercept that and pipe the request to your pfsense via the tailscale interface. This makes Unbound respond with the tailscale IP you defined in the custom view. Good.

If you have a machine at home set to override your home DNS (maybe your work VPN forces other DNS servers) it won’t work as any other DNS server doesn’t know about service1.mydomain.com. Bad.

Ubuntu 22.04 Dislikes RSA Keys

I wrote in a past article about how I’m setting up Hetzner dedicated servers with full disk encryption even if they miss an ikvm and why Debian 10 machines require a RSA key for this.

But since I switched one of my workstations to Ubuntu 22.04 I was unable to login using this RSA key. Running ssh with debug enabled showed the likely culprit:

debug1: Offering public key: /home/user/.ssh/id_rsa
debug1: send_pubkey_test: no mutual signature algorithm

The message sent me on the right track, Ubuntu 22.04 has disabled RSA keys support by default. I’m not arguing with that, I don’t really like using RSA since better alternatives are around so I don’t want to change this default, but still I would like to be able to reboot my Debian 10 servers. So a command line option later I was able to use RSA keys only when I want them:

ssh -o PubkeyAcceptedKeyTypes=+ssh-rsa root@1.2.3.4

Full Disk Encryption on Hetzner Dedicated and Debian 10 Woes

As absolutely nobody knows or uses I maintain an ansible role that can setup a Debian or Ubuntu machine with full disk encryption on Hetzner Robot (baremetal dedicated machines).

But wait, you shout, Hetzner usually runs consumer grade stuff without kvm’s – how do you enter your password at bootime. Easy, the role sets up a minimal boot environment with a dropbear ssh server where you can login and do cryptroot-unlock.

While developing the role I realised that it was impossible to unlock a Debian 10 machine, even though I was 100% sure ansible was adding the proper key logging in to the boot envirnment was impossible, I kept getting

Permission denied (publickey).

I lost some good hours troubleshooting being sure ansible was somehow not adding the proper key. Until I searched the web and realised the version of dropbear shipped with Debian 10 does not support the ed25519 keys I so cheerfully use for the added security and elegant shortness.

So the fix was, for Debian 10 machines, to maintain a rsa key to use when logging in to boot the machines.

I’m Back

For years I used to run both a tech blog and a blog in my native language. At some point I realized I haven’t posted in years, backed it up and shut everything down.

Since then a few things happened: I got into 3D printing, have a sometime interesting $dayjob, gave up on Facebook (never been much active on twitter). All those are going towards me wanting to share stuff (I’m sure all 3 of you reading this will be happy) and having no place to do it. So without further ado

i'm back baby - Typical Bender | Meme Generator

P.S. if you wonder what’s with all the 2009-2016 posts already here: those are posts I imported from my old blog. When I have time I recover a few of them, especially if I find them still relevant. For kicks and giggles I might even recover some Symbian stuff I posted around 2009. Depending on mood and me forgetting or not to push the relevant buttons old articles might keep their place or appear as posted recently.

Oracle complains about long identifier on simple operations.

Soo,

I’m going head first into the oracle db world. I was trying to create an spfile from the pfile and of course it didn’t work:

SQL> create spfile from pfile="/oracle/app/product/12.1.0/dbhome_1/dbs/initORCL.ora";
create spfile from pfile="/oracle/app/product/12.1.0/dbhome_1/dbs/initORCL.ora"
*
ERROR at line 1:
ORA-00972: identifier is too long

The reason is simple enough, you have to use single quotes instead of double quotes. But it took me a while to find out this so here it is for all other beginners.

SQL> create spfile from pfile='/oracle/app/product/12.1.0/dbhome_1/dbs/initORCL.ora';

File created.

SQL>

Errors in mail.log from nagios check_ssmtp

I got my nagios server banned by fail2ban because of errors in the postfix mail.log log. I know that I can simply whitelist the nagios server but I prefer it working perfectly.

Checking the logs I could see this error repeating itself on each check:

Mar 25 13:01:13 xxx-123 postfix/smtpd[17065]: connect from nagios.example.com[1.2.3.4]
Mar 25 13:01:13 xxx-123 postfix/smtpd[17065]: improper command pipelining after QUIT from nagios.example.com[1.2.3.4]:
Mar 25 13:01:13 xxx-123 postfix/smtpd[17065]: disconnect from nagios.example.com[1.2.3.4]

Apparently postfix is picky about having extra input after a QUIT or DATA command, see details here.

It turns out that I haven’t updated nagios plugins in a while. Even if I kept nagios up-to-date the plugins were at 2.0.3. Updating to 2.1.1 fixed the issue and now I simply see a connect/disconnect in the postfix logs when nagios performs a check.

AIX git SSL woes

Oh joy and happiness I have to admin AIX boxes. One of the first things I hit was using git to clone some stuff from github erroring out with:

SSL certificate problem: unable to get local issuer certificate

Yep, simple problem, no ssl ca bundle on the system. You can use either the bulldozer solution and have either:

export GIT_SSL_NO_VERIFY=true

either:

git config http.sslVerify false  ( git config --unset http.sslVerify )

because who cares about MITM attacks especially to deployed software on production servers

Or you can go to actually fix the issue and install a ca bundle. I downloaded mine from the curl site, here: https://curl.haxx.se/docs/caextract.html

I downloaded the cacert.pem file and configured git to use it like this:

wget --no-check-certificate  https://curl.haxx.se/ca/cacert.pem -O /var/ssl/cacert.pem
git config --system  http.sslcainfo /var/ssl/cacert.pem

The no-check-certificate is required because at this point wget has no way of checking the certificate either. If you want to ensure the validity of the file download it from a working system and scp it to the remote problem server.

Use Vagrant on Windows with Ansible under Cygwin

If due to some reason you have to run vagrant under windows and plan on using Ansible you will need a couple of wrappers.

ansible-playbook.bat has to be in the Windows PATH;

ansible-winpath-playbook.sh is called by ansible-playbook.bat to change paths from Windows style paths to *nix style paths that ansible under cygwin can understand.

Verify SMART details for members of an Intel RST RAID volume

Sooo,

Be it because of the BIOS update to a beta or because of my drives but my RAID10 keeps failing. I documented before how to repair such a broken array but I didn’t want to go ahead with it too many times as data corruption is only one step away. Knowing that at least one of the disks has some minor issues (mdadm kicked it out some time ago when the disks were running under linux) I decided to check smart details and only keep only two of the disks in RAID1. I was curious if one can read SMART details when the disks are still members of the Intel RST array. Since I had all the data off the disks it was safe to test.

I found out thet the Intel SSD Toolbox shows SMART data for all disks in a system, not only SSDs and not only Intel. Look at Other Drives and scroll to the right as under Intel Solid-State Drives it shows the RAID volumes.

Intel RST RAID Non-RAID Disk after BIOS update

So, having nothing better to do and for no good reason I decided to update my workstation’s BIOS to the latest version released by Gigabyte. Since ignoring the “If it works don’t fix it” mantra is always a good idea. Beautiful, after update two of my disks from a four disk RAID10 array were showing as Non-RAID Disk. I had backups but shuffling 2TB+ of data is never fun.

Initial reports were all grim, the Intel RST BIOS does not allow repairing. Thankfully a good soul had always found the answer, source thread here thank-you adamsap.

Usual disclaimer: this worked for me, I have no guarantee it will work for you, and the method is not advertised as working and/or suported by Intel

  1. Reset the volume (all disks) as non-member from the Intel BIOS. Ignore the warning that all data will be lost. The utility only touches the metadata related to RAID membership.
  2. Create a new array with the all same disks and be sure to use the same settings related to strip size, RAID type, etc. I was in luck since my array was still visible since some disks still were attached.
  3. Download TestDisk from http://www.cgsecurity.org. I used the Windows version since my Windows install was on a different disk. I never heard of this utility but seems to be really, really useful at data recovery.
  4. Run TestDisk after reading the steps on their site. Be sure to read the documentation there to know what you are doing. In brief (so I’m sure you read the original docs) you have to: search for your partition(s) on the raid volume – if everything was recreated with the same settings it should find it quickly in a few seconds – and save the partition table.
  5. After the partition table is saved reboot.
  6. The array should be back with all the data.

I compared checksums for some of the data against backups and it turns out everything is back.