We will discuss what feature engineering is all about, various techniques to use and how to scale to 20000 column data sets using random forest, svd, pca. Also demonstrated is how we can build a service around these to save time and effort when building 100s of models. We will share how we did all this using spark ml to build logistic regression, neural networks, Bayesian networks, etc.
Sridhar Alla is a Big Data expert helping small & big companies solve complex problems such as data warehousing, governance, security, real time processing, high-frequency trading and establishing large scale data science practice. Sridhar Alla is also an agile practitioner as well as a certified agile Devops practitioner and implementer. Sridhar started his career as a Storage Software Engineer in Network Appliance, Sunnyvale then worked as the Chief Technology Officer at a cyber security firm eIQNetworks, Boston. His most recent job is that of a Director of Data Science & Engineering at Comcast, Philadelphia. Sridhar is an avid presenter at numerous Strata, Hadoop World, Spark Summit and other conferences. Sridhar also provides onsite/online training on several technologies. He also has several patents filed with the US PTO on large scale computing and distributed systems. He holds a Bachelors degree in Computer science from JNTU, Hyderabad, India and lives with his wife in New Jersey.
Sridhar has over 18 years of experience writing code in Scala, Java, C, C++, Python, R and Go. He also has extensive hands-on knowledge of Spark, Hadoop, Cassandra, HBase, MongoDB, Riak, Redis, Zeppelin, Mesos, Docker, Kafka, ElasticSearch, Solr, H2O, machine learning, text analytics, distributed computing and high performance computing. He is also a published author of the book “Scala and Spark for Big Data Analytics”.