Projects

A summary of all kinds of projects I did. Listed in alphabetic order.

Awesome Sentiment Analysis

A curated list of Sentiment Analysis methods, implementations and misc. Compiled from a list of knowledge too shallow for a scientific survey but good enough as a pointer for studying Sentiment Analysis and Natural Language Processing in general.

Sentiment Analysis is the field of study that analyzes people’s opinions, sentiments, evaluations, attitudes, and emotions from written languages. (Liu 2012)

Decodeswitch (codeswitching language identification for English-Spanish)

Codeswitching - Identify the language of each word in a sentence mixed of English and Spanish.

Codeswitching (CS) is a widely observed phenomenon in social media where people communicate in two or more langauges interchangeably, (Spanish and English, for example). Codeswitching is common among bilingual speakers, both in speech and in writing. Identifying the languages in a codeswitched input is a crucial first step before applying other natural language processing algorithms.

This system, constructed using Conditional Random Field (CRF) and fastText word vectors participated in the shared task of the Second Workshop in Computational Approaches to Codeswitching.

Expostal

Elixir binding for Libpostal - a library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

As a side-product of this work, a tutorial for extending Erlang/Elixir with C programming language is produced.

FastText for Windows

FastText is a library for efficient learning of word representations and sentence classification. It implements the algorithms described in Bojanowski et al (2016) and Joulin et al (2016). I used it in my previous research Xia (2016) to aid language identification in codeswitched sentences.

FastText’s authors do not currently support building and running on Windows. There were attempts to run with Bash for Windows and MinGW. This unofficial build is compiled with Visual C++ 2015 to run natively on Windows. A Pull-Request has been submitted to the upstream. Meanwhile, I will be maintaining this binary distribution.

NS (news summarizer)

An automatic news summary generator Using Bing cognitive service api, we get a list of urls of multiple news sources given a keyword. We then extract the article text from the webpages. SumBasic (Nenkova and Vanderwende 2005) algorithm is used to generate a summary.

Pyenm (Simultaneous Pinyin + English input method)

Type English and Mandarin without switching input method.

Unlike English, Chinese characters cannot be typed directly using a latin-script keyboard. Mandarin speakers often type in Pinyin, the most widely used romanization schemes for Mandarin. A Pinyin Input Method Editor (IME) is used to convert Pinyin into Chinese characters. To type both English and Mandarin in a same sentence, users must repeatedly toggle on and off the IME, which is a major inconvenience.

Schemats (No-ORM type-safe SQL query in Node.js)

The wide spread use of Object Relational Mapping (ORM) gives an impression that they are essential for writing database interacting applications. Schemats proposes a new approach towards implementing statically typed, PostgreSQL backed service in NodeJS.

Using Schemats, you can generate TypeScript interface definitions from (Postgres) SQL database schema automatically.

Start with a database schema:

Users
id SERIAL
username VARCHAR
password VARCHAR
last_logon TIMESTAMP

Automatically have the following TypesScript Interface generated

1
2
3
4
5
6
interface Users {
id: number;
username: string;
password: string;
last_logon: Date;
}

For an overview on the motivation and rational behind this project, please take a look at Statically typed PostgreSQL queries in Typescript .

Pure Luck

Now there is a fair way to choose a winner. Everyone, place your finger on the screen!

Simply QR

You feed me text, I give you QR. No BS

SSR-Proxy (Server-Side Rendering Proxy)

Prerender your single page app for better SEO and support on legacy browsers.

SSR-Proxy is a HTTP proxy which you can put in front of your existing Single Page App server to achieve server-side rendering. With SSR-Proxy, we take a different approach in Server-Side Rendering. Instead of rendering frontend components in Nodejs, we use an actual headless browser — PhantomJS to render SPA and proxy the rendered HTML to the client.

TomatoCapsule

Elegant Pomodoro Timer with Acitivity Log. Available on all Windows 10 devices; phones, tables and desktops.

Undercover Camera

Hidden camera app. Available on all Windows 10 devices; phones, tables and desktops.

Undercover Camera, on a glance, appears to be a ebook reading app. However, while holding down the magnifier button, the “ebook” becomes increasingly transparent, revealing the camera preview behind it.