A
Public Data AI Preprocessing Pipeline
4.05
Derivation Chain
Step 1
Expanding public data AI utilization (Ministry of Interior survey)
→
Step 2
Growing demand for AI training data preprocessing
→
Step 3
Public data-specialized preprocessing automation pipeline
Problem
When AI Startups try to use public data for model training, each API has different response formats (XML/JSON/CSV), with Korean encoding errors, date format inconsistencies, and missing value handling required — data engineers spend an average of 2-3 days preprocessing per API. For teams using 10-20 APIs per month, 30-40% of annual data engineer labor costs are consumed by simple preprocessing.
Solution
Input a public data API URL to automatically collect responses, unify formats (JSON normalization), fix encoding, standardize dates, handle missing values, and remove duplicates with one click — outputting clean datasets ready for AI training. Preprocessing pipelines can be saved as templates for reuse in scheduled data collection.
NUMR-V Scores
NUMR-V Scoring System
| N Novelty | 1-5 | How uncommon the service is in market context. |
| U Urgency | 1-5 | How urgently users need this problem solved now. |
| M Market | 1-5 | Market size and growth potential from proxy indicators. |
| R Realizability | 1-5 | Buildability for a small team with realistic constraints. |
| V Validation | 1-5 | Validation signal quality from competition and demand data. |
SaaS N=.15 U=.20 M=.15 R=.30 V=.20
Senior N=.25 U=.25 M=.05 R=.30 V=.15
Feasibility (70%)
Data Availability
20.6/25
Feasibility Breakdown
| Tech Complexity | / 40 | Difficulty of core implementation stack. |
| Data Availability | / 25 | Practical availability and cost of required data. |
| MVP Timeline | / 20 | Expected time to ship a usable MVP. |
| API Bonus | / 15 | Bonus for viable public API leverage. |
Market Validation (61/100)
Validation Breakdown
| Competition | / 20 | Signal quality from competitor landscape. |
| Market Demand | / 20 | Demand proxies from search and mention patterns. |
| Timing | / 20 | Fit with current shifts in tech, behavior, and regulation. |
| Revenue Signals | / 15 | Reference evidence for monetization viability. |
| Pick-Axe Fit | / 15 | How well the concept serves participants in a trend. |
| Solo Buildability | / 10 | Practicality for lean-team implementation. |
Technical Requirements
Backend [medium]
Frontend [low]
Data Pipeline [medium]