2017-08-05

Porting PSRAW to PowerShell Core: Lessons Learned

psraw3

Intro

I took a significant break from commits to my PSRAW project. I have spent that time learning more about Open Source projects and Object Oriented Programming. Before I could move the project forward I had some architectural decisions to make and I don't have quite enough knowledge to make those decisions yet, but I'm getting there.

On July 14th, this blog from Microsoft dropped a bit of a bombshell on the PowerShell community by making it clear the path forward is in PowerShell Core.Windows PowerShell will still be developed/maintained, but the primary focus will be PowerShell Core. There was also a call to test out PowerShell Gallery modules. I put off the leap to Core for awhile, but it seemed that now was the time. My module is still young and flexible and I suspected that most of it would work on Core.

Shortly after that blog post, I  unleashed the Kraken and created a new branch in my local PSRAW repo named CoreRefactor, installed PowerShellCore 6.0.0-beta.4, switched VS Code to use PowerShell Core for the integrated terminal, and fired up my pester tests. Thus flowed a glorious sea of red failures and errors. This kicked off 2 weeks of refactoring. I wanted to share what I have learned from the experience.

At the time of this post PowerShellCore 6.0.0-beta.5 has just been released. I just completed my test against it. So everything in this post is at least relevant to that release.


Pester Differences

The first thing I noticed about my pester tests on PowerShell Core was that Pester was creating as many errors as my code was. The Core implementation of Pester has changed the way Should processes arguments. Instead of parameters takes a collection of objects. That meant that all of my tests which use a parameter style were failing.

Describe 'Some Test' {
Context 'Breaks in Core' {
        It 'Does not work' {
            'SomeString' | Should -BeOfType 'String'
        }
    }
    Context 'Works in Core' {
        It 'Works' {
            'SomeString' | Should BeOfType 'String'
        }
    }
}

Also, the BeLike Should operator is no longer available.

Describe 'Valid Should Operators' {
    It 'Does not include BeLike' {
        'Some text' | Should BeLike 'some*'
    }
}

I had to switch to Match for those and change the patterns to regex.


No More System.Web

The System.Web assembly is not available in .NET Core. I primarily use System.Web.HttpUtility for query string parsing. The ability to create a dictionary from a query string and vice versa is very handy. I also use it for URL encoding and decoding. This functionality is available in .NET Core through Microsoft.AspNetCore, but this is not included with PowerShell and has a bunch of dependencies. I never did find anything in the included libraries that would do this kind of thing. Also, before anyone mentions [Uri]::UnescapeDataString(), note that it does not handle certain things properly (such as plus signs as spaces).

If anyone has some non-AspNetCore solutions to these problems, please let me know.

In PSRAW I was using this functionality primarily for the Implicit grant flow as the OAuth Access token is return in the resulting URL as a query string in the fragment section. I was also using it to build the authorization URLs. I  played around with trying to include just the required libraries from AspNetCore so I could get this functionality back. This ended up breaking other things unless I included all dependencies. There were also some version differences between the .NET core PowerShell is using and what is required by some of the required libraries for AspNetCore.

This all ended up being a waste of time any way because there is:


No More System.Windows

This one should have been more obvious to me. System.Windows.Forms is used in PSRAW to create a mini browser to perform a login and application authorization on Reddit for the Code and Implicit grant flows. Once I had part of AspNetCore replacing the System.Web functionality, all hell broke loose on my GUI elements. I realized that if I wanted my module to me compatible with Core, I would need to drop the GUI elements. But, with OAuth, that can be tricky. Certain grant flows require user interaction via browser. I looked quickly into making this possible through the CLI, but it would just require a significant amount of work.

Ultimately, I decided that I would drop the grant flows that require GUI interaction. Luckily for PSRAW, Reddit supports a 'password' grant flow that allows the user to get an OAuth Access Token using the user's credentials along with the application client id and client secret. I think the password grant flow would fit most use cases for a PowerShell based Reddit client.

Removing the code and implicit grant flows I could rip out a significant amount of code from the project.  So many functions methods, fields, and even a class could be removed. It also means I can revisit required steps to start using Reddit to make it much simpler and less painful to get up and running.


Invoke-WebRequest is Very Different

On the surface Invoke-WebRequest and Invoke-RestMethod in 6.0 look very similar to their 5.1 counterparts. Underneath the hood there is a few critical differences. 6.0 is using System.Net.Http.HttpResponseMessage where as 5.1 is using System.Net.HttpWebResponse. You still get a HtmlWebResponseObject or BasicHtmlWebResponseObject returned to you. These look very much the same between 5.1 and 6.0, but when you look close enough the back end difference starts to creep through.

Content-Type Header Buried (For Now)

In PSRAW, I am using Invoke-WebRequest instead of Invoke-RestMethod because I need several response headers. Reddit provides rate limit information through response headers and in order to bake in automated rate limiting I can't receive the just the JSON objects from Invoke-RestMethod. This means I need to do the JSON conversion to PSObject manually, but only after a Content-Type response header check.

The HttpResponseMessage class separates the content related headers from the rest of the response headers. The new base type is intended to be very strongly typed. This means that the Content-Type header is not available in the default HtmlWebResponseObject.Headers dictionary.

5.1:

$response = Invoke-WebRequest 'google.com'
$response.Headers.'Content-Type'

6.0:

$response = Invoke-WebRequest 'google.com'
$response.BaseResponse.Content.Headers.ContentType.ToString()

PowerShell is reserializing the HttpResponseMessage.Headers for both HtmlWebResponseObject.Headers and HtmlWebResponseObject.RawContent, but is not including the HttpResponseMessage.Content.Headers dictionary. That means the Content-Type is buried deep in the HtmlWebResponseObject.BaseResponse object and completely missing from the usual locations. This is why my checks for Content-Type were failing.  All API responses were not being seen by my module as application/json and thus not being automatically converted to PSObjects.

I have Pull Request #4494 in now to include the content header dictionary in both the RawContent and Headers serialization. In the meantime I have added this private function to get the Content-Type:

function Get-HttpResponseContentType {
    [CmdletBinding(
        ConfirmImpact = 'Low',
        HelpUri = 'https://psraw.readthedocs.io/en/latest/PrivateFunctions/Get-HttpResponseContentType',
        SupportsShouldProcess = $false
    )]
    [OutputType([String])]
    param
    (
        [Parameter(
            Mandatory = $true,
            ValueFromPipeline = $true,
            ValueFromPipelineByPropertyName = $true
        )]   
        [Microsoft.PowerShell.Commands.WebResponseObject]
        $Response
    )
    Process {
        @(
            $Response.BaseResponse.Content.Headers.ContentType.MediaType
            $Response.BaseResponse.ContentType
        ) |
            Where-Object {$Null -ne $_ -and $_ -ne [string]::Empty} |
            Select-Object -First 1
    }
}


Strict Header Validation is Now Default

Another change under the hood in 6.0 is that Invoke-WebRequest and Invoke-RestMethod now uses System.Net.Http.HttpRequestMessage to for creating HTTP requests. Due to the strict typing required in System.Net.Http, the default behavior of adding headers to the request before it is sent to the web server now has strict validation for known header types. This was first reported in Issue #2895.

Pull Request #4085 was made to address it by adding a -SkipHeaderValidation (available now in 6.0.0-beta.5) switch to Invoke-WebRequest and Invoke-RestMethod . This is a kind of breaking change between 5.1 and 6.0 because the behavior in 5.1 is not to validate any headers. It will probably not be a major issue for most, but if you happen to work with an API that uses non-compliant headers for well known headers, this could be a pain point. It also breaks backwards compatibility because now in 6.0 you have to supply the -SkipHeaderValidation  switch parameter. In 5.1 this parameter does not exist so you will get a ParameterBindingException exception.

My work around for this is to add this to my PSRAW.psm1:

$PSDefaultParameterValues['Invoke-WebRequest:SkipHeaderValidation'] = $True

I think it is a neat little trick since $PSDefaultParameterValue will not cause errors on non-existent parameters.

BUT, #4085 only applied  the -SkipHeaderValidation  switch parameter to the headers passed to the -Headers parameter. The User-Agent header is passed in from the -UserAgent parameter. It just so happens that Reddit requires a non-compliant User-Agent to be sent as an identifier of the application. Even with #4085 in place I was getting invalid format errors. I submitted Pull Request #4479 to extend the behavior of -SkipHeaderValidation to include -UserAgent, but it didn't get merged until a few hours after the  6.0.0-beta.5 release. So, look forward to that being fixed in 6.0.0-beta.6.


No SecureString on Linux

After a few more adjustments I finally had all of my tests passing in 6.0 and 5.1. But this was on Windows only. The best part about PowerShell Core is that it brings PowerShell to Linux! I figured my mostly API accessing and in-memory-object creating module had a very good chance of being cross platform capable. I fired up a new Ubuntu 16 VM and installed the 6.0.0-beta.5 deb package, fixed a lack of TEMP environment variable and ran my pester tests.

An immediate show stopper became apparent. I had a sea of red with "Unable to load DLL 'CRYPT32.dll'". A quick google search and Issue #1654 was found. PSRAW makes use of SecureString objects to keep access tokens and user passwords encrypted in-memory. I use them in PSCredential Objects and it makes for a great way to import and export them via Import-Cliixml and Export-Clixml so they are also encrypted at rest between PowerShell sessions.This looks like it wont be fixed in Linux until PowerShell 6.1.0.


Conclusion

I learned a great deal from this exercise. I learned even more about the inner workings of both PowerShell Core and CoreFX. I have mixed feelings about the experience. On one hand, it was extremely rewarding to see all green tests for PSRAW on Core.On the other hand, I had to make some sacrifices in the project to move forward and Linux compatibility is not quite there yet. I also really enjoyed submitting Issues and Pull Requests to the PowerShell Core project. I now have contributed to both PowerShell-Docs and PowerShell Core.

I still have quite a a bit of documentation to fix now that so much has been ripped out of the project. I also  have to document the few features I was in the middle of adding before I started this refactor. It looks like I will be moving to a Major version for the next release and it might not include all the things I promised for the next minor version release. However, I believe this puts the project in a better position to grow and also puts it slightly ahead of the curve. Hopefully I will have a release in the next few weeks.

I hope someone finds the info here helpful or at least found my struggles entertaining!

Join the conversation on Reddit!