Camila Javiera Muñoz Navarro
@CamilaJaviera91Hey!! I'm from Chile and I'm trying to learn the wonders of data engineering
Language Breakdown
Lines of code distribution across 26 owned repositories
I-Shaped Developer
I-shapedSpecialist — deep expertise in Python
Collaboration Network
Global Impact visualization
Repos
26
PRs
0
Growth
+18%
Top Collaborators
No collaborator data yet.
Coding Streak
Contribution activity over the past year
Mariyam Siddiqui
@MariyamSiddiqui
Hossein Hezami
@hosseinhezami
TEJANAIK
@TejaNaik15
Jakub Frieske
@jakub-frieske
Dr. Partha Majumder
@DrParthaMajumder
Top Repositories
Generate large-scale synthetic datasets using SQL and BigQuery.
This code demonstrates how to integrate PySpark with datasets and perform simple data transformations. It loads a sample dataset using PySpark's built-in functionalities or reads data from external sources and converts it into a PySpark DataFrame for distributed processing and manipulation.
This project is designed to extract sales data from a PostgreSQL database, process it, and use a Random Forest model to predict sales quantities. It also visualizes real and predicted sales for better understanding.
Code in which an initial approach to decision trees and bagging will be made, and an attempt will be made to ensure that the model can be trained with any dataset coming from Kaggle (for this, we will again use the 'connect with Kaggle' project).
This repository provides a set of scripts to extract data from a MySQL database, transform it into a CSV file, and integrate it with Google Sheets. The workflow includes database connection, querying, data transformation, and file generation.
This project includes two main scripts: `cvs_to_sheets.py` and `google_sheets_utils.py`. These scripts allow data processing from Google Sheets, performing data cleaning and analysis, and generating charts in a PDF file. Additionally, the processed results can be saved back to Google Sheets.
With this code you can search and download any data from kaggle
Este repositorio contiene una colección de métodos y algoritmos de métricas de distancia y similitud diseñados para cuantificar el grado de parecido entre dos campos de texto (cadenas) provenientes de dos archivos de datos diferentes.
This project simulates a modern data pipeline architecture, entirely locally. It follows a modular design to extract, transform, load, validate, and analyze synthetic sales data using Python, Apache Beam, DuckDB, and PostgreSQL.
This project defines a modern data pipeline architecture using Airflow, DBT, and PostgreSQL. Below you'll find instructions on how to get started and how the repository is structured.
Open Source Impact
Contributions to external projects
No external contributions found.