Data Analyst @ Quadas (July 2017 - August 2017)

2024-01-08 222 words 2 minutes

Contents

Quadas is a SaaS cloud advertising technology company that offers mobile programmatic cloud solutions for customers. They build transparent mobile DSP and data management platforms to improve the customer’s data management abilities and provide programmatic advertising technology for digital marketing companies.

1 - Responsibility

Utilized PySpark to conduct distributed analysis on terabytes of user behavior data on AWS S3.

Due to large amount of data, I decided to use PySpark to speed up the process of data I/O. In this step, I would get raw data of user behavior. Then I would do some feature engineering to convert these raw data into valuable features for each user.

1
2
3
4
5
6
7
8
raw data:
  1. Device ID: DSP chose device ID
  2. Site ID: website the user visited
  3. …
user features:
  1. avg_reqs_per_day
  2. avg_reqs_per_site
  3. ...

Improved online advertising effectiveness by 10% through detecting web traffic fraud with an unsupervised auto-encoder based approach.

The concept is quite intuitive. I pre-trained an auto-encoder to have the ability to encode “real” user data into a subspace and decode it back. However, this decoding capability doesn’t extend to fraudulent ones. Moreover, this approach allows us to use unsupervised learning and we usually have plenty of real user behavior data. Data labeling is usually expensive, hard, and in some cases unavailable.