Download .txt file from Berkeley Earth data website section “Land + Ocean (1850 — Recent)” and read it using the following Python command:
colspecs = [(2, 6), (10, 12), (14, 22), (24, 29)]
df = pd.read_fwf(
df.columns = ["year", "month", "anomaly_C", "confidence_95_C"]
colspecs defines the column indexes so (2, 6) represents year in the source text file.
This article is an extension of my previous article describing a similar deployment process using native AWS Lambda tools. However, Amazon since started supporting container images and updated it’s pricing policy to 1ms granularity. Both are major developments improving tooling and making small deployments cost effective.
My previous article focused on the logic of the code and didn’t address how to actually deploy the function because that was well covered by AWS in its many tutorials. Here I explore the new the container deployment options while keeping all business logic untouched.
Please review the AWS tutorial on deploying a generic…
This article is a follow up to my previous tutorial on how to setup Google Colab and auto-sklean. Here, I will go into more detail that shows auto-sklearn performance on an artificially created dataset. The full notebook gist can be found here.
First, I generated a regression dataset using scikit learn.
X, y, coeff = make_regression(
This generates a dataset with 100 numerical features where the first 5 features are informative (these are labeled as “feat_0” to “feat_4”). The rest (“feat_5” to “feat_99”) are random noise. …
This alphabetically sorted collection of ML and data resources was last updated on 4/27/2021.
Auto ML is fast becoming a popular solution to build minimal viable models for new projects. A popular library for Python is Auto-sklearn that leverages the most popular Python ML library scikit-learn. Auto-sklearn runs a smart search over scikit-learn models and parameters to find the best performing ensemble of models.
The key first step is to install linux dependencies alongside Auto-sklearn:
!sudo apt-get install build-essential swig
!pip install auto-sklearn==0.11.1
After running these commands in…
Google released a white paper describing how the company intends to generate all of its electricity needs from renewable energy sources by 2030. Previously, Google committed to reducing emissions by buying offsets or generating renewable energy off-cycle. This new commitment goes by further: “Google intends to match its operational electricity use with nearby carbon-free energy sources in every hour of every year”
Everybody interested should read it — it’s short.
Google cooperated with a Watttime to generate the dataset that measures the carbon emissions intensity in regions where Google’s data centers are located. Watttime has a very interesting API providing…
While AWS Lambda functions are typically used to build API endpoints, at their core Lambda functions can return almost anything. This includes returning html markup with dynamic content.
If you prefer to develop and test lambda functions locally (as I do), you can use Docker to simulate the AWS lambda function environment. A sample Dockerfile I use is below.
FROM amazonlinux:latest RUN mkdir -p /mnt/app ADD …
I have gone through many iterations of what my preferred scikit-learn custom pipeline looks like. As of 6/2020, here is my latest iteration.
In general, a machine learning pipeline should have the following characteristics:
COVID-19 became a serious concern for wider public in the USA somewhere between February and March of 2020. Today on May 16, stock markets in the USA are optimistically higher than a year ago while most of the country is still under strict lockdown restrictions.
Uncertainty around the virus is still high and vaccine seems to be 12–18 months away. The closest region approaching herd immunity is NYC with 20% spread of the virus among the population. But that is nowhere near herd immunity requirements. …
With the employment rate in the US jumping to alarming 14.7% from less than 4% in just one quarter, the COVID-19 crisis has created a gap between those who are allowed to work, and those who are not.
When people talk about inequality, they typically discuss absolute wealth levels as a point in time. This is similar as looking at a company’s balance sheet. However, I’ll argue that looking at a person’s freedom to continue working is more important. Not unlike looking at a company’s cash flow.
I classified industries into 4 quadrants depending whether they are mission critical to…