Polished documentation

2024-10-09 09:58:38 -06:00 · 2024-10-09 09:58:38 -06:00 · 5458efe4c1
commit 5458efe4c1
parent 92b6d52fbf
6 changed files with 61 additions and 16 deletions
--- a/README.md
+++ b/README.md
@ -2,13 +2,15 @@

 ## Step 1 - Deploy Webserver

-I was able to provision a new virtual machine on my hypervisor and install the requisite software for this task. I chose to install Debian, as I have experience with it and it is well-suited for tasks such as these.
+I was able to provision a new virtual machine on my Proxmox hypervisor and install the requisite software for this task. I chose to install Debian, as I have experience with it and it is well-suited for tasks such as these.

-Once it was running, I installed Docker, cloned the given repository, and built the Docker image. I then started it with a shell as PID 1 to explore the container.
+While I do have an existing Docker host VM, I decided to create a new one in order to fully document the process here and ensure that it was fresh in my mind.
+
+Once it was running, I installed Docker, cloned the given repository, and built the Docker image. I then started it with a shell in order to explore the container.

 The web app is a simple Django app with a single URL defined which simply renders the `index.html` template. It has the admin app installed, but it doesn't actually seem functional.

-Once I had determined this, I wrote a simple Ansible playbook to deploy the container from scratch, including installing dependencies, pulling the latest Dockerfile, and building the container. I also added an entry to my Nginx configuration (at https://sysadmin-exercise.internal.ezri.dev/, accessible from the ITS building or USU VPN), as I consider HTTPS access to be a part of deploying a webapp. I then simply visited the above site from my browser.
+Once I had determined this, I wrote a simple Ansible playbook to deploy the container from scratch, including installing dependencies, pulling the latest Dockerfile, and building the container. I also added an entry to my Nginx configuration (at https://sysadmin-exercise.internal.ezri.dev/, accessible from the ITS building or USU VPN), as I consider HTTPS access to be a part of deploying a webapp unless told otherwise. I then visited the above site from my browser.

 ### Tasks
 1. Create virtual machine in Proxmox hypervisor
@ -24,11 +26,19 @@ Once I had determined this, I wrote a simple Ansible playbook to deploy the cont

 ## Step 2 - Complete Additional Exercises

-As I am most familiar with Linux and Python, I elected to do the following exercises.
+I completed three of these exersises (two Linux, one Python) and was honestly having fun with them so I decided to do all of the Linux and Python ones.
+
+### Linux Exercises

 - [Exercise 1: User Report](user-report)
 - [Exercise 2: Watchdog Script](size-watchdog)
 - [Exercise 3: Remove an Invalid Character](invalid-char)
 - [Exercise 4: Condition Testing](condition-testing)
+
+### Python Exercises
+
+As most of my experience with Python is in programming and scripting, I wrote these solutions as executable Python files. However, they are all short and simple enough that they can be executed from a REPL (such as `ipython`) without issue, and in fact this is mostly how I was testing them.
+
 - [Exercise 5: Log Parsing](python-logs)
 - [Exercise 6: CSV Parsing](parse-csv)
+- [Exercise 7: API Interaction](joke-api)
--- a/joke-api/README.md
+++ b/joke-api/README.md
@ -0,0 +1,18 @@
+## Interact with an API with Python
+
+I decided to write a simple script to demonstrate this in Python, but it can be done through a REPL as well. In fact, in my current position, I regularly use a Python REPL to introspect and debug APIs in much the same way as I have done in this script. I also use the Linux utility `jq` for this purpose, though its syntax is a bit less readily understandable.
+
+I use the `requests` library, even though it's not part of the Python standard library, as it is, in my experience, the de-facto standard for interacting with APIs in Python.
+
+The basic code to download 10 random jokes and get just the programming ones is as follows, assuming that the `requests` librarie have been imported:
+
+```python
+response = requests.get('https://official-joke-api.appspot.com/jokes/random/10')
+jokes = response.json()
+programming_jokes = [joke for joke in jokes if joke['type'] == 'programming']
+```
+
+After this, the `programming_jokes` variable will contain a list of only programming jokes, which can then be further manipulated as I demonstrate in the script.
+
+In the script, I filter the programming jokes using a `for` loop rather than list comprehension as I want to avoid duplicates. I do this by storing them in a dictionary rather than a list, indexed on the joke ID. This ensures that, should the API return a duplicate joke, the new version will simply overwrite the old one, and the length of the dictionary will not increase.
+
--- a/joke-api/joke-api.py
+++ b/joke-api/joke-api.py
@ -1,14 +1,12 @@
 #!/usr/bin/env python3

 import requests
-import json
 import time


 def download(url):
    response = requests.get(url)
-    content = response.content.decode("utf-8")
-    return content
+    return response.json()


 def main():
@ -18,9 +16,8 @@ def main():
        programming_jokes = {}
        while len(programming_jokes) < 5:
            # Parse the JSON response
-            jokes = json.loads(
-                download("https://official-joke-api.appspot.com/jokes/random/10")
-            )
+            jokes = download("https://official-joke-api.appspot.com/jokes/random/10")
+
            # Filter to just programming jokes and insert them
            for joke in jokes:
                if joke["type"] == "programming":
--- a/python-logs/README.md
+++ b/python-logs/README.md
@ -0,0 +1,13 @@
+## Parse a log file with Python
+
+As with my other Python exercises, I demonstrate this in an executable Python file, but these commands can just as easily be run from a REPL.
+
+This demonstrates a simple log parser that will count any failures of the `pam_unix.so` PAM authentication module, which are the failures listed in the given log file. I decided to additionally track the number of failures per remote host, as that can be extremely valuable when investigating authentication failures on SSH servers.
+
+As stated in the large comment in the middle of the script, this check only works for some authentication setups. As I noted above, the log file given has authentication errors produced by PAM, however, in my experience the default setup for SSH is to use its built-in password authentication mechanisms rather than the "keyboard-interactive" mode that passes authentication to PAM. The SSH password auth failure logs are formatted differently, and this script will not count them.
+
+Additionally, if the PAM configuration is different from the default (such as 2-factor authentication, or LDAP bind authentication), the failure may not originate from `pam_unix.so`. It may not even originate from the authentication stack in PAM, as is the case for many Linux LDAP implementations that allow any user in the database to resolve and authenticate, implementeing access control in the PAM "account" stack instead.
+
+Finally, it will never report a failed authentication if the client only offers an SSH key and that fails. This is never indicated in the given log, and the odds of encountering that in the wild are low, but it is worth keeping in mind.
+
+In other words, this script works for the logs given, but will need to be modified depending on the system it is parsing logs for.
--- a/python-logs/parselogs.py
+++ b/python-logs/parselogs.py
@ -19,7 +19,9 @@ def main():
        for line in f:
            # check for auth failure in line
            # NOTE: This is a rudimentary check, and will not work for all log formats. This was chosen for the log file provided.
-            # as an example, it will not work when the failure arises from PAM, as those logs are formatted differently
+            # as an example, it will not work when the system uses SSH's built-in password authentication, as those logs are formatted differently.
+            # It will also not work if the authentication failure is logged in a different PAM module, or if authentication is successful but the PAM
+            # account stack (user authorization) fails.
            if "authentication failure" in line:
                # add failure
                total += 1
--- a/user-report/README.md
+++ b/user-report/README.md
@ -1,10 +1,15 @@
 ## Report Users with UID >= 1000

-I decided to take a more complex approach with this task than would be necessary on a "standard" Linux installation to make the script more robust. It still has some tweaks that would need to be made on a University domain-joined computer (namely, checking the `lastlog` command and only printing users who have actually logged in, depending on the purpose of the report, since otherwise it would include _all_ accounts, as LDAP providers usually handle authorization independently of user enumeration and authentication). However, for smaller lists of centralized users, or for a system that makes heavy use of ephemeral users managed by `systemd`, this script will work.
+I decided to take a more complex approach with this task than would be necessary on a "standard" Linux installation to make the script more robust. It still has some tweaks that would need to be made on a University domain-joined computer (namely, checking the `lastlog` command and only printing users who have actually logged in, depending on the purpose of the report, since otherwise it would include _all_ accounts, as LDAP user providers usually handle authorization independently of user enumeration and authentication). However, for smaller lists of centralized users, or for a system that makes heavy use of ephemeral users managed by `systemd`, this script will work.

-I did this mainly because I use centralized authentication on my personal computers (mainly for synchronization of user IDs for NFS reasons), so I wanted to make sure I didn't provide a script that wouldn't even function on my own computers. It is, admittedly, less valuable to parse `getent passwd` enumeration when working with a large number of users in the central auth server.
+I did this mainly because I use centralized authentication on my personal computers, so I wanted to make sure I didn't provide a script that wouldn't even function on my own computers. It is, admittedly, less valuable to parse `getent passwd` enumeration when working with a large number of users in the central auth server.

-However, what I see as the main purpose of a report like this -- getting a list of people who can log into a server -- would be better accomplished by checking the LDAP settings on said server and doing a manual LDAP search based on those settings. That way, you wouldn't have to filter out all the users that are not allowed to log in (and will be blocked at the authorization stage by the PAM `account` LDAP module) but can still be resolved.
+However, what I see as the main purpose of a report like this -- getting a list of people who can log into a server -- would be better accomplished on a system with LDAP authentication by checking the LDAP settings on said server and doing a manual LDAP search based on those settings. That way, you wouldn't have to filter out all the users that are not allowed to log in (and will be blocked at the authorization stage by the PAM `account` LDAP module) but can still be resolved.

-As for some design decisions I made:
- I use `awk` to do the filtering, rather than a shell `if` statement, because I'm already using it to format the output. The `cut` command would work for extracting the fields, but can't format the output in one go like `awk` can.
+As for some design decisions I made...
+
+I use `awk` to do the filtering, rather than a shell `if` statement, because I'm already using it to format the output. The `cut` command would work for extracting the fields, but can't format the output in one go like `awk` can. In an older version of the script, I do loop through the lines with a shell `for` loop and filter them with `if`, but I was still formatting it with `awk`.
+
+I ended up with a one-liner after simplifying the script from a `for` loop. Ultimately, because `awk` operates on lines rather than whole input, I decided that it would make more sense to send it the entire output of `getent` rather than one line at a time.
+
+I also filter out the `nobody` user as it exists on every Linux system and would likely not have relevance to the person asking for this report. However, if it is still desired to be included, re-adding it to the report would be trivial; simply remove the `grep` command from the pipe.