Berkeley Earth publishes an unique dataset with global temperature measurements. Below is a guide to the download the data and start analyzing it using Python. All code can be found in this gist.

Berkeley Earth air temperature measurements above sea ice

Download .txt file from Berkeley Earth data website section “Land + Ocean (1850 — Recent)” and read it using the following Python command:

colspecs = [(2, 6), (10, 12), (14, 22), (24, 29)]
df = pd.read_fwf(
"{folder}/Land_and_Ocean_complete.txt",
colspecs=colspecs,
header=85
)
df.columns = ["year", "month", "anomaly_C", "confidence_95_C"]

colspecs defines the column indexes so (2, 6) represents year in the source text file.


This article is an extension of my previous article describing a similar deployment process using native AWS Lambda tools. However, Amazon since started supporting container images and updated it’s pricing policy to 1ms granularity. Both are major developments improving tooling and making small deployments cost effective.

Deploying AWS Lambda using a container

My previous article focused on the logic of the code and didn’t address how to actually deploy the function because that was well covered by AWS in its many tutorials. Here I explore the new the container deployment options while keeping all business logic untouched.

Please review the AWS tutorial on deploying a generic…


This article is a follow up to my previous tutorial on how to setup Google Colab and auto-sklean. Here, I will go into more detail that shows auto-sklearn performance on an artificially created dataset. The full notebook gist can be found here.

First, I generated a regression dataset using scikit learn.

X, y, coeff = make_regression(
n_samples=1000,
n_features=100,
n_informative=5,
noise=0,
shuffle=False,
coef=True
)
Subset of 100 generated features

This generates a dataset with 100 numerical features where the first 5 features are informative (these are labeled as “feat_0” to “feat_4”). The rest (“feat_5” to “feat_99”) are random noise. …


This alphabetically sorted collection of AI, ML, and data resources was last updated on 3/26/2021.

ML breakdown: Supervised + Unsupervised + RL


Auto ML is fast becoming a popular solution to build minimal viable models for new projects. A popular library for Python is Auto-sklearn that leverages the most popular Python ML library scikit-learn. Auto-sklearn runs a smart search over scikit-learn models and parameters to find the best performing ensemble of models.

Logos of Google Drive + Colab + Scikit-learn + Auto-sklearn

This tutorial describes how to setup Auto-sklearn on Google Colab. The complete notebook gist includes a toy project that uses an old Airbnb dataset from Kaggle.

The key first step is to install linux dependencies alongside Auto-sklearn:

!sudo apt-get install build-essential swig
!pip install auto-sklearn==0.11.1

After running these commands in…


Google released a white paper describing how the company intends to generate all of its electricity needs from renewable energy sources by 2030. Previously, Google committed to reducing emissions by buying offsets or generating renewable energy off-cycle. This new commitment goes by further: “Google intends to match its operational electricity use with nearby carbon-free energy sources in every hour of every year”

Google’s energy journey

Everybody interested should read it — it’s short.

Google cooperated with a Watttime to generate the dataset that measures the carbon emissions intensity in regions where Google’s data centers are located. Watttime has a very interesting API providing…


While AWS Lambda functions are typically used to build API endpoints, at their core Lambda functions can return almost anything. This includes returning html markup with dynamic content.

AWS Lambda + Python + Jinja

I will not go into details describing how to deploy AWS Lambda functions. Please see the official documentation. I will however describe how to return dynamic html content instead of a typical JSON.

Step 0 — Optional

If you prefer to develop and test lambda functions locally (as I do), you can use Docker to simulate the AWS lambda function environment. A sample Dockerfile I use is below.

FROM amazonlinux:latest RUN mkdir -p /mnt/app ADD …

I have gone through many iterations of what my preferred scikit-learn custom pipeline looks like. As of 6/2020, here is my latest iteration.

scikit-learn logo

In general, a machine learning pipeline should have the following characteristics:

  1. Include every step shared between training and scoring to ensure consistency. The pipeline does not need to include one-off steps such as removing duplicates which would not be relevant at scoring time.
  2. Have as few custom components as necessary. For example, when filling missing values in numerical columns with median value, there is no reason NOT to use SimpleImputer class from sklearn.impute. In my example gist…

COVID-19 became a serious concern for wider public in the USA somewhere between February and March of 2020. Today on May 16, stock markets in the USA are optimistically higher than a year ago while most of the country is still under strict lockdown restrictions.

S&P 500 May 16, 2019 — May 16, 2020

Uncertainty around the virus is still high and vaccine seems to be 12–18 months away. The closest region approaching herd immunity is NYC with 20% spread of the virus among the population. But that is nowhere near herd immunity requirements. …


With the employment rate in the US jumping to alarming 14.7% from less than 4% in just one quarter, the COVID-19 crisis has created a gap between those who are allowed to work, and those who are not.

Industry classification by job security during the COVID crisis

When people talk about inequality, they typically discuss absolute wealth levels as a point in time. This is similar as looking at a company’s balance sheet. However, I’ll argue that looking at a person’s freedom to continue working is more important. Not unlike looking at a company’s cash flow.

I classified industries into 4 quadrants depending whether they are mission critical to…

Adam Novotny

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store