Playing with Logs – Regex
Recently I was playing with some sql event viewer logs. These logs don’t have properties that we can pull from to make life easier. So, everything has to be parsed through the string. Which isn’t bad, but it’s a challenge to think about. Here is what one of the strings looks like:
Login failed for user 'db_admin'. Reason: Password did not match that for the login provided. [CLIENT: 10.0.80.55]
I wanted three items from this list, the username, the reason, and the IP address. The username is inside the single quotes. The reason goes from the word reason to the [ for client. The IP address is inside the brackets for the client. Lets get started. First lets get the data.
$test = invoke-command -ComputerName servername -ScriptBlock {Get-WinEvent -FilterHashTable @{logname='application';providername='MSSQLSERVER';Keywords='4503599627370496'}}
Grabing Everything between double single qoutes.
Now we have the data, it’s time to get the username. As I said, the username is found inside the single quotes. So we want to select the string with a pattern that pulls the double single quotes.
($t.message | Select-String -Pattern "'.*?'" -AllMatches).Matches.Value -replace "'",""
‘.*?’
This bad boy here grabs everything between the single quotes. The ‘—‘ is the boundaries. the . says to match any character except the terminators. Yeah, we don’t want skynet. Then the X? tells it to match the previous token between 0 and forever times. The select starts off with the first ‘ and then moves to the next character. which is a wild card. Then we search all wildcards forever until the next ‘ with the *? characters.
We select every item in the string that matches that pattern using the -AllMatches. Then we grab the Matches by using the ().Matches. We want those values so we select the value from the matches. ().Matches.Value. This still selects the double single quotes ‘ and we really don’t want this. So we simply remove them by using the -replace command. We replace the ‘ by saying -replace “‘”,””. Looks a little confusing but it works.
Grabbing Text after a word and before a symbol.
The next part is to grab the reason. This is basically grabbing everything after a single word and before something different. The logic is the same as before, but this time we are grabbing it based off a word.
Reason = (($t.Message | Select-String -Pattern "Reason:.*\[" -AllMatches).Matches.Value -replace ' \[','') -replace 'Reason: ',''
“Reason:.*\[“
In this instance we are searching for the word Reason:. Once we find that word, we select the first object in front of it using a wild card again. The wild card is a . like before. Then we tell it to continue searching using the * until we reach special character of [. Notice the \ is before the [. The reason for this is because in the world of Regex the bracket, [, is a special character used for searching. Thus this code is saying, start with the word reason: and search everything until you reach the square bracket.
Once we have the pattern we select all the matches like before with the -allmatches. We then select the matches and the values using the ().matches.value commands. From there we want to remove the square bracket and the word reason:. We do that with replace commands to remove the word reason we use -replace (“Reason: “,”) and to remove the extra space and square bracket we us -replace (‘ \[‘,”). Notice once again, the \ before the square bracket.
Pulling an IP address from a string
The next thing we need is the IP address of the log. The IP address is located in the square brackets with the word client inside of it. The key here is not to search those brackets. We want the IP address of any and all strings. We want all the IP addresses and not just one.
IPAddress = ($t.message | Select-String -Pattern "\d{1,3}(\.\d{1,3}){3}" -AllMatches).Matches.Value
\d{1,3}(\.\d{1,3}){3}
This one is much more complex than the last one. The first thing we do is look for is up to three digits side by side. \d means digits. The {1,3} means between 1 and 3. We do this until we reach a . mark. Then we repeat the process again 3 times. We use the () to create a group. Inside that group, we have the \. which is the decimal point followed by the \d{1,3} again. Saying after the decimal point looks for up to three digits again. Finally, we tell the code to do this 3 times with the {3} tag at the end of the group.
Like before we use the -allmatches flag to get all the matches and pipe it out using the ().Matches.value method. But wait! This only pulls the IP address format, not the IP address. This works for an IP address of 512.523.252.1 which we all know is an invalid IP address. To do that we have to dive much deeper into regex. This next part is complex.
Validate an IP address
The above gives an idea of what we want to look for, a starting point, here is the big fish that we need to break down. This code is a bit longer. This code is broken up between whatif structures. Which is pretty cool. It’s going to take some effort to explain it all. So we will take one group at a time with each whatif | statement. Remember each group represented with () is for a single octet. I am going to try my best at explaining this one. I’m not fully sure, but once again, I will try.
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-5][0-9]|[01]?[0-9][0-9]?)$
25[0-5]|
Our first what if statement is the 25[0-5]. The maximum number for an octet is 255. In fact, 255.255.255.255 is the broadcast address. In this what-if statement, we are looking at matching the first two characters of an octet as 25 then we are looking for any number after that as a 0 to 5. So, anything like 256 would be ignored.
2[0-4][0-9]|
The next one is for the middle of the 255 range. This tests to see if the range is 2xy. The x is between 0 and 4 and the y is 0 and 9. I’ll be honest, When I changed out the 4 with a 5, the value of 256 was accepted when it shouldn’t have been. There is something special later on on this one.
[01]?[0-9][0-9]?
Now we are matching if the first character is a 1. So, this will cover the 100-199 ranges. If there is a 1 there, if not, then it will cover the 0-99. The ? matches the previous item between 0 and one times. So the previous item is a 1 or a number. This creates the nice 0-199 range.
\.){3}
The \ symbol is our break symbol to break the regex processing on the next item which is our . symbol. Normally . means a wild card. In this case it means a period or decimal point. Then we close the group with our closing parentheses, ). Now we have made a group that determines if an object is between 1 and 255 with a period at the end, we need to do this 3 times. There is multiple ways to do this. We can repeat the code 3 times or we can just say {3}. That’s what we did here.
(25[0-5]|2[0-5][0-9]|[01]?[0-9][0-9]?)
Finally, we repeat the code again. This time without the \. at the end. The reason we do this is that the last octet doesn’t have a period at the end. This matches between 1-255 without the period.
Grabbing Mac Addresses from a string
Another item I pull from logs is mac addresses. Most of the time these are security logs from a firewall. However, it’s important to be able to pull a match address. The big thing between a mac address and an IP address is the mac address requires letters and numbers. They also come in all forms of delimiters. the most common are :, ., –, and a space for the truly evil people. Thus, you have to address each of these items. Here is the code:
([0-9a-fA-F]{2}[: \.-]){5}([0-9a-fA-F]{2})
[0-9a-fA-F]{2}
The first part of the code is looking for the numbers 0 – 9. For example 0F:69:0F:FE:00:01 is the code. The first section is 0F. The 0 – 9 helps us find the 0. The next part a – f helps us find a,b,c,d,e, and f. The A – F helps us find A,B,C,D,E, and F. This way we don’t have to worry about case sensitivity when searching for the logs as some logs don’t capitalize the letters. Finally, we are going to search this two times before our next symbol.
[: \.-]
This next part is the search for the different symbols. Notice we have the common ones in there. The first is the standard colon :. It is followed by a space and then the escape character because the period is a wild card character. Which is then followed by the hyphen – as ipconfig /all gives you hypens. It’s all within the search brackets.
(){5}
We then close up the group with our () marks. This will search for at least one of those that match. We want 5 items in a row for a mac address. Mac addresses contain 6 sections. So, the next code is important to find that 6th. We search for the 5 in the row by the {5} mark.
([0-9a-fA-F]{2})
We repeat the code over again. This way we get that last section of the mac address. This time tho, we don’t have the search for the unique symbols as the last section of a mac address doesn’t have one.
Putting these in a script
If you read my blog often, you know I like functions. Here are two functions you can add to your tool bag.
Get-SHDIPFromString
Function Get-SHDIPFromString{
[cmdletbinding()]
Param (
[string]$String
)
foreach ($string in $string) {
($String | Select-String -Pattern "((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" -AllMatches).Matches.Value
}
}
Example:
PS> Get-SHDIPFromString -String "This message contains 192.168.25.5 as an IP address."
192.168.25.5
Get-SHDMacFromString
function Get-SHDMacFromString {
[cmdletbinding()]
Param (
[string[]]$String
)
foreach ($string in $String) {
($String | Select-String -Pattern '(?<mac>([0-9a-fA-F]{2}[: \.-]){5}([0-9a-fA-F]{2}))' -AllMatches).matches.value
}
}
Example:
PS> Get-SHDMacFromString -String "Physical Address. . . . . . . . . : 11-35-AF-FE-11-A1"
11-35-AF-FE-11-A1