COVID19 Plotting with Julia


We source our data from covidtracking.com, and process it using Julia, we pick the daily.csv which can be accessed and loaded like the following:

using CSV, DataFrames

daily_url = "https://covidtracking.com/api/v1/us/daily.csv"
df_US = @pipe download(daily_url) |>
        CSV.File(_; types=Dict(:date => String)) |> DataFrame!
which parse the csv file and load it into a DataFrame (the :date => String is a hint for CSV.jl to parse the date column as String for later conversion). We can check the column names and what each columnes have
@show names(df_US)
@show first(df_US)
names(df_US) = ["date", "states", "positive", "negative", "pending", "hospitalizedCurrently", "hospitalizedCumulative", "inIcuCurrently", "inIcuCumulative", "onVentilatorCurrently", "onVentilatorCumulative", "recovered", "dateChecked", "death", "hospitalized", "lastModified", "total", "totalTestResults", "posNeg", "deathIncrease", "hospitalizedIncrease", "negativeIncrease", "positiveIncrease", "totalTestResultsIncrease", "hash"]
first(df_US) = DataFrameRow
│ Row │ date     │ states │ positive │ negative │ pending │ hospitalizedCurrently │ hospitalizedCumulative │ inIcuCurrently │ inIcuCumulative │ onVentilatorCurrently │ onVentilatorCumulative │ recovered │ dateChecked          │ death  │ hospitalized │ lastModified         │ total    │ totalTestResults │ posNeg   │ deathIncrease │ hospitalizedIncrease │ negativeIncrease │ positiveIncrease │ totalTestResultsIncrease │ hash                                     │
│     │ String   │ Int64  │ Int64    │ Int64    │ Int64?  │ Union{Missing, Int64} │ Union{Missing, Int64}  │ Int64?         │ Int64?          │ Union{Missing, Int64} │ Union{Missing, Int64}  │ Int64?    │ String               │ Int64? │ Int64?       │ String               │ Int64    │ Int64            │ Int64    │ Int64         │ Int64                │ Int64            │ Int64            │ Int64                    │ String                                   │
├─────┼──────────┼────────┼──────────┼──────────┼─────────┼───────────────────────┼────────────────────────┼────────────────┼─────────────────┼───────────────────────┼────────────────────────┼───────────┼──────────────────────┼────────┼──────────────┼──────────────────────┼──────────┼──────────────────┼──────────┼───────────────┼──────────────────────┼──────────────────┼──────────────────┼──────────────────────────┼──────────────────────────────────────────┤
│ 1   │ 20200703 │ 56     │ 2786059  │ 31427438 │ 2237    │ 37750                 │ 247284                 │ 5589           │ 10936           │ 2049                  │ 1059                   │ 883561    │ 2020-07-03T00:00:00Z │ 122158 │ 247284       │ 2020-07-03T00:00:00Z │ 34215734 │ 34213497         │ 34213497 │ 635           │ 1562                 │ 663492           │ 57562            │ 721054                   │ 1c411a99c1d22058187d2c2c5c2b26c93dda480a │

Let's make a simple bar plot of the newly increased cases, we can use the positiveIncrease column directly:

using Plots
#reverse, becasue the newest date is at top
bar(df_US.positiveIncrease |> reverse)

We actually want to plot dates, and also limit range a bit. First, convert the date column:

using Pipe, Dates
df_US.date = @pipe df_US.date .|> Date(_, dateformat"yyyymmdd")
then we can truncate the date range:
df_US_trunc = df_US[df_US.date .> Date(2020, 3, 15), :]
sort!(df_US_trunc, :date) #sort in place
@pipe df_US_trunc |> 
      bar(
          _.positiveIncrease, 
          title="US Daily New Cases since 3/15",
          label=nothing,
          xlabel="Day since 3/15"
      )
The Pipe.jl package can help clear the syntax and x |> f is just a different way of writting f(x) that is clearer in some cases.

Finally, let's add a fancy simple moving average (sma) of 7 days to smooth out the crappy man-made 7-day cycle.

using Indicators
@pipe df_US_trunc |> 
plot!(
    sma(Float64.(_.positiveIncrease), n=7),
    lw=3, label="7 day moving avg.", legend=:topleft
)

Here sma actually has a 'bug' that it doesn't take Int64, probably will get fixed later :shrug.

Using the powerful syntax, we can easily extend this into making grid of plots for different states.

df_States = @pipe download("https://covidtracking.com/api/v1/states/daily.csv") |>
        CSV.File(_; types=Dict(:date => String)) |> DataFrame!
df_States.date = @pipe df_States.date .|> Date(_, dateformat"yyyymmdd")
select!(df_States, [:date, :state, :positive, :negative, :positiveIncrease])
dropmissing!(df_States)

p_sma7(x; args...) = plot!(sma(Float64.(x), n=7), label="7d avg.", lw=3; args...)
function p_state(state="MA")
@pipe df_States[
    (df_States.date .> Date(2020,3,15)) .& (df_States.state .== state) , :] |> 
    begin
        bar(_.positiveIncrease |> reverse, title="$state New Cases (day since 3/15)", label=nothing)
        p_sma7(_.positiveIncrease |> reverse)
    end
end

plot(
p_state.(["CA", "FL", "TX", "MA", "PA", "GA"])..., 
size=(1200,900), 
layout=(3,2),
legend=:topleft, 
ylims=(0,1e4), dpi=320
)