Julia Mini Tutorial


Before anything, I'd like to recommend Learn X in Y minutes (where x=Julia), especially if you're already sold about using Julia and just want to jump-start.

Also, if you're coming from Python or C++ or MATLAB background, check out the Noteworthy Differences page, from the manual, and a cheat sheet. Btw, the manual itself is well written and concise, I recommend going through casually.

Introduction

There are many dynamic programming languages out there, from MATLAB to Python or even Mathematica. There are also many "fast" languages out there, most, however, are not dynamic. The stated goal of Julia-lang is to solve the two language program and make convenience of dynamic language and the speed of "hard" languages (think of C, C++) into one easy to write, maintain, and powerful language.

Before you turn and say "well, Numpy is fast and dynamic", I'd like to highlight that, Numpy is 50% C, and if you're anything above "just using it", you will soon hit complex code buried in C and your python "niceness" is gone completely.

Bit digression

For me, using and hacking Julia as a process actually tought me so much about programming (esp. high-performance computing (HPC)) in general, I will showcase a few tricks in Julia that helps you to quickly learn from others && gain insight as to what your computer is actually doing when you run a function, make a loop etc.

Basics

Not sure if I should state how to use REPL (interactive session) first or just dive into the language. I'm lazy so here we go!

Variable assignment and basic types

x = 2+2
y = 3÷2
z = 3//2
PIE = π

@show x,y,z
@show typeof(z)
@show typeof(π)
(x, y, z) = (4, 1, 3//2)
typeof(z) = Rational{Int64}
typeof(π) = Irrational{:π}

Julia is a language designed for general purpose but emphasize on technical computing, so the Type system is superb, notice, Julia is not a OO (Object-oriented) language, we will see soon how Multi-dispatch is actually better off, for almost all science/technical computing use case.

We can explore the type structure a bit:

@show supertype(Integer)
@show supertype(Rational)
@show supertype(Real)
@show supertype(Number)
supertype(Integer) = Real
supertype(Rational) = Real
supertype(Real) = Number
supertype(Number) = Any

Types are capitalized (a convention). Types are great for making your program as genera as possible while keeping performance && extensibility for others building on top of your package.

Function definition and friends

Julia does not rely on indentation, one good thing is that you won't be be 100 lines into a function and not sure if you missed an indentation causing your if branch to be in the wrong place or not.

f(x) = x^2 # one-liner
function f_long(x=3.16) # default value
    x^2 # `return` can be omitted, last line will be returned
end

f(3.16) == f_long()
true

To construct keyword function or anonymous function, do the follwoing:

anony = (x,y) -> x+y # multi variable anonymous function
function k_func(x; key1)
    key1^x
end

@show anony(3, π)
# String are multiplied because "a"*"b" == "ab" like x*y == xy
@show k_func(2, key1 = "ke")
anony(3, π) = 6.141592653589793
k_func(2, key1 = "ke") = "keke"

Multi-dispatch

So, it's like C++'s overload, but better (more flexible and less manual labor). There was a heated debate a while ago on Julia discourse, I won't cover it but I'd like to cite an example to show that Multi-dospatch is different from say, C++ oervloading. The example basically shows the essence:

Multi-dispatch is resolved at run-time and always behaves as if it were fully dynamic. [1]

Now let's see a more canonical example (which, can be done with C++ overloading, but the labor exploeds exponentially in real-world case):

function multi_fun(x::Integer, y::String)
    y^x
end
multi_fun(2, "ΨΣΧ")

"ΨΣΧΨΣΧ"
now, clearly, if x is not an integer, this is not well-defined (you can't repeat a String 0.3 times). But there may be an analogous operation to it. This is a common pattern in science, where we use the same symbol for similar but actually different operations all the time!

function multi_fun(x::AbstractFloat, y::String)
    length(y)^x
end
multi_fun(1.5, "ΨΣΧ")
5.196152422706632

What happens behind the scene is that Julia will pick the "most specific version" when you call a function and "dispatch" to that method. Notice, f(x) is the same as f(x::Any) – no limitation on what x can be.

Now you may ask, but then this is ugly, this reminds me of C++! It does, a bit, but the difference is, you don't need to enumerate all the possible types combination for your function. And, writing generic code does NOT imply performance penalty, for example, the following function is "type-stable" (i.e compilers can optimize):

stable(x::Number) = zero(x)
stable(x::Array) = zero(eltype(x))
@which stable(1)
stable(x::Number) in Main.FD_SANDBOX_17593669135483838073 at none:1
@which stable(1.0)
stable(x::Number) in Main.FD_SANDBOX_17593669135483838073 at none:1
@which stable([1.0, 2.0])
stable(x::Array) in Main.FD_SANDBOX_17593669135483838073 at none:1

you can check with @code_warntype to see if they are stable. They are, because zero will return the 'correct' zero-element for the type you passed in, and, dispatching on Number vs. Array allow you to to same thing across different data types.

Most importantly, Number is an Abstract Type, so when we call the function with 1 a version of stable with x::Int64 is compiled, when we call it with 1.0, a version with x::Float64 is compiled etc. So you can write generic yet performant code with ease.

Finally, I'd like to mentioned an excellent talk on the incredible extensiablity of Julia eco system thanks to 1) not-OO (meaning no private class functions you have to monkey patch) 2) multi-dispatch allows extension on Struct data types user defines.

Arrays

Array is am important data type for technical computing because how much we care about data and so on. In Julia Array means N-dimensional, and, 1-D and 2-D Array are given convenient names: Vector, Matrix:

ary = [0,0,0]
typeof(ary) == Vector{Int}
true

You can 'prepare' and empty array with known type and size (common practice in HPC):

using Latexify
empty_ary = Array{Float32, 2}(undef, 2,2)
ls = latexify(empty_ary)
print(ls)
UndefVarError: s not defined
# the bang (!) is a convension for non-pure function (which may mutates input)
fill!(empty_ary, Float32(9))
empty_ary
2×2 Array{Float32,2}:
 9.0  9.0
 9.0  9.0

Speaking of array, broadcast is an important feature in Julia, essentially, you can do element-wise opertaion by just writing the 'scalar' operation:

h(x) = x^2
h.([1,2,3])
3-element Array{Int64,1}:
 1
 4
 9

This is nice in two ways:

  1. you don't need to be a genius to plan for arrays ahead

  2. eliminates ambiguity when you're looking at a piece of code

  3. (hidden: easy for compiler to optimize this into SIMD or perform auto unroll)

Loop and other clauses

This part appears this late because I want to play with array, and, this part is very canonical (just like python, let's say) so I don't expect you guys to have difficulty with it:

seq = Float64.(1:10) # [1,2,3...10], but make them Float64 for later in-place change

for i in eachindex(seq)
    if seq[i] > 5
        seq[i] *= 2
    else
        seq[i] /= 2
    end
    nothing
end

seq
10-element Array{Float64,1}:
  0.5
  1.0
  1.5
  2.0
  2.5
 12.0
 14.0
 16.0
 18.0
 20.0
let head = first(seq) # `let` and `begin` are good friends, 
#`let` allows you to initialize a local variable
    while head < 10
        popfirst!(seq)
        head = first(seq)
    end
end

seq
5-element Array{Float64,1}:
 12.0
 14.0
 16.0
 18.0
 20.0

Of course, many of these dummy task have built-in solutions:

@show sum(abs2, 1:3)
filter(x->x<10, 1:4:20)
sum(abs2, 1:3) = 14
3-element Array{Int64,1}:
 1
 5
 9
filter(x->x<10, 1:4:20)
3-element Array{Int64,1}:
 1
 5
 9

Extra bits on loops and arrays (SIMD)

using BenchmarkTools

ary = rand(1000) # an array of 1000 random float64 between 0-1
function my_sum(a)
    s = zero(eltype(a))
    for i in a
        s += i
    end
    return s
end

function my_sum_simd(a)
    s = zero(eltype(a))
    @simd for i in a
        s += i
    end
    return s
end
display(@btime my_sum($ary))
@btime my_sum_simd($ary)
  694.857 ns (0 allocations: 0 bytes)
  43.132 ns (0 allocations: 0 bytes)

We see a significant speed-up just by 'annotating' that we want the loop to be SIMDed, of course, due to addition rearranging, the result would be slightly different:

(my_sum(ary), my_sum_simd(ary))
(495.71494292361746, 495.7149429236172)

How do we know under the hood SIMD is happening? Ah yes, Julia shows you da way:

@code_lowered my_sum_simd(ary)
CodeInfo(
1 ─ %1  = Main.FD_SANDBOX_17593669135483838073.eltype(a)
│         s = Main.FD_SANDBOX_17593669135483838073.zero(%1)
│         r#354 = a
│   %4  = Base.simd_outer_range
│   %5  = (%4)(r#354)
│         @_4 = Base.iterate(%5)
│   %7  = @_4 === nothing
│   %8  = Base.not_int(%7)
└──       goto #8 if not %8
2 ┄ %10 = @_4
│         i#355 = Core.getfield(%10, 1)
│   %12 = Core.getfield(%10, 2)
│   %13 = Base.simd_inner_length
│   %14 = r#354
│         n#356 = (%13)(%14, i#355)
│   %16 = Main.FD_SANDBOX_17593669135483838073.zero(n#356)
│   %17 = %16 < n#356
└──       goto #6 if not %17
3 ─       i#357 = Main.FD_SANDBOX_17593669135483838073.zero(n#356)
4 ┄ %20 = i#357 < n#356
└──       goto #6 if not %20
5 ─ %22 = Base.simd_index
│   %23 = r#354
│   %24 = i#355
│         i = (%22)(%23, %24, i#357)
│         s = s + i
│         i#357 = i#357 + 1
│         $(Expr(:loopinfo, Symbol("julia.simdloop"), nothing))
└──       goto #4
6 ┄       @_4 = Base.iterate(%5, %12)
│   %31 = @_4 === nothing
│   %32 = Base.not_int(%31)
└──       goto #8 if not %32
7 ─       goto #2
8 ┄       Main.FD_SANDBOX_17593669135483838073.nothing
└──       return s
)

This show the lowered code representation of your function (with the given argument). well, there's %13 = Base.simd_inner_length and so on, which is a good hint. If you dig deeper into @code_llvm which will show the emitted LLVM bitcodes or even near assembly code with @code_native, you will see:

...
; ││└
; ││ @ simdloop.jl:75 within `macro expansion'
	vaddpd	%ymm0, %ymm1, %ymm0
	vaddpd	%ymm0, %ymm2, %ymm0
	vaddpd	%ymm0, %ymm3, %ymm0
	vextractf128	$1, %ymm0, %xmm1
...

Which are the instruction for SIMD addition.

[1] https://discourse.julialang.org/t/julia-isnt-multiple-dispatch-but-overloading/42370/14